[alto] Tsvart early review of draft-ietf-alto-performance-metrics-03

Discussion:

Brian Trammell

2018-04-09 16:35:30 UTC

Reviewer: Brian Trammell
Review result: On the Right Track

I've performed a (late, apologies) early TSV-ART review of
draft-ietf-alto-performance-metrics-03.

The set of metrics chosen by the document seem broadly useful and sane, and the
integration into ALTO makes sense. However, there are a few issues with the
details.

Periodic One Way Delay, RTT, and PDV are defined in terms of section 8, section
4, and section 5, respectively, of draft-ietf-ippm-initial-registry, which
specify active measurement test methodologies at layer 4 for one-way and
round-trip delay using UDP packets. This does not seem it can be measured
directly using the routing technologies the authors have identified as their
source of information. Is the intention that dedicated active measurement
hardware be used to measure delay using UDP packets, or should these metrics
reference [RFC2679] and [RFC2681] and leave the methodology undefined, instead?

The examples for these don't make much sense: the units are expressed in
seconds, but Internet-scale delays are generally millisecond-scale, and the
examples given contain only integers. Similarly, packet loss rate is given in
percentile, but there are wide variations in usability between a path with 0%,
1%, and 2% packet loss. Is this simply an issue with the examples?

The hop count metric is underspecified: are these IP-experienced hops at layer
3, as can be measured by traceroute?

Nit: section 2.1 refers to [OSPF-TE], [ISIS-TE], [BGP-LS] and [BGP-PM], but
these are not listed as such as references in the references section. Please
use consistent reference labels.

Thanks, cheers,

Brian

Brian Trammell (IETF)

2018-04-12 19:47:56 UTC

Permalink

On further review, I have an additional question about the metrics specified in the document:

Max reservable (sec 8.1) and residue (sec 8.2) bandwidth seem to be fairly straightforward to calculate in a stable manner, as they are essentially matters of router configuration, but I'm a little concerned about the temporal aspects of available and utilized bandwidth. On any link containing congestion controlled traffic not otherwise limited (e.g. by a tunnel reservation), even a small number of active bulk transfer flows will cause the utilized bandwidth to max out on short timescales: you get an almost uselessly noisy metric.

I presume, since this metric is taken straight from an IS-IS standards track RFC, that this doesn't happen in practice, i.e. that the temporal denominator is large enough that it averages these smaller peaks out. It'd be nice if the document named a timescale though.

I'm also a little dubious that instantaneous bandwidth is a useful cost metric for a system like ALTO, because even on sane timescales the delay imparted by the system makes the information hard to act on in a useful way, unless the link is a large enough aggregate that the bandwidth doesn't change much over time.

Cheers,

Brian

Post by Brian Trammell
Reviewer: Brian Trammell
Review result: On the Right Track
I've performed a (late, apologies) early TSV-ART review of
draft-ietf-alto-performance-metrics-03.
The set of metrics chosen by the document seem broadly useful and sane, and the
integration into ALTO makes sense. However, there are a few issues with the
details.
Periodic One Way Delay, RTT, and PDV are defined in terms of section 8, section
4, and section 5, respectively, of draft-ietf-ippm-initial-registry, which
specify active measurement test methodologies at layer 4 for one-way and
round-trip delay using UDP packets. This does not seem it can be measured
directly using the routing technologies the authors have identified as their
source of information. Is the intention that dedicated active measurement
hardware be used to measure delay using UDP packets, or should these metrics
reference [RFC2679] and [RFC2681] and leave the methodology undefined, instead?
The examples for these don't make much sense: the units are expressed in
seconds, but Internet-scale delays are generally millisecond-scale, and the
examples given contain only integers. Similarly, packet loss rate is given in
percentile, but there are wide variations in usability between a path with 0%,
1%, and 2% packet loss. Is this simply an issue with the examples?
The hop count metric is underspecified: are these IP-experienced hops at layer
3, as can be measured by traceroute?
Nit: section 2.1 refers to [OSPF-TE], [ISIS-TE], [BGP-LS] and [BGP-PM], but
these are not listed as such as references in the references section. Please
use consistent reference labels.
Thanks, cheers,
Brian

Randriamasy, Sabine (Nokia - FR/Paris-Saclay)

2018-04-13 15:27:45 UTC

Permalink

Hi Brian,

Thanks so much for your thorough review. We'll get back with answers and update proposals asap.
Cheers,
Sabine

-----Original Message-----
From: Brian Trammell (IETF) [mailto:***@trammell.ch]
Sent: Thursday, April 12, 2018 9:48 PM
To: TSV Area Review Team <tsv-***@ietf.org>
Cc: draft-ietf-alto-performance-***@ietf.org; ***@ietf.org; IETF Discussion Mailing List <***@ietf.org>
Subject: Re: Tsvart early review of draft-ietf-alto-performance-metrics-03

On further review, I have an additional question about the metrics specified in the document:

Max reservable (sec 8.1) and residue (sec 8.2) bandwidth seem to be fairly straightforward to calculate in a stable manner, as they are essentially matters of router configuration, but I'm a little concerned about the temporal aspects of available and utilized bandwidth. On any link containing congestion controlled traffic not otherwise limited (e.g. by a tunnel reservation), even a small number of active bulk transfer flows will cause the utilized bandwidth to max out on short timescales: you get an almost uselessly noisy metric.

I presume, since this metric is taken straight from an IS-IS standards track RFC, that this doesn't happen in practice, i.e. that the temporal denominator is large enough that it averages these smaller peaks out. It'd be nice if the document named a timescale though.

I'm also a little dubious that instantaneous bandwidth is a useful cost metric for a system like ALTO, because even on sane timescales the delay imparted by the system makes the information hard to act on in a useful way, unless the link is a large enough aggregate that the bandwidth doesn't change much over time.

Cheers,

Brian

Post by Brian Trammell
Reviewer: Brian Trammell
Review result: On the Right Track
I've performed a (late, apologies) early TSV-ART review of
draft-ietf-alto-performance-metrics-03.
The set of metrics chosen by the document seem broadly useful and
sane, and the integration into ALTO makes sense. However, there are a
few issues with the details.
Periodic One Way Delay, RTT, and PDV are defined in terms of section
8, section 4, and section 5, respectively, of
draft-ietf-ippm-initial-registry, which specify active measurement
test methodologies at layer 4 for one-way and round-trip delay using
UDP packets. This does not seem it can be measured directly using the
routing technologies the authors have identified as their source of
information. Is the intention that dedicated active measurement
hardware be used to measure delay using UDP packets, or should these metrics reference [RFC2679] and [RFC2681] and leave the methodology undefined, instead?
The examples for these don't make much sense: the units are expressed
in seconds, but Internet-scale delays are generally millisecond-scale,
and the examples given contain only integers. Similarly, packet loss
rate is given in percentile, but there are wide variations in
usability between a path with 0%, 1%, and 2% packet loss. Is this simply an issue with the examples?
The hop count metric is underspecified: are these IP-experienced hops
at layer 3, as can be measured by traceroute?
Nit: section 2.1 refers to [OSPF-TE], [ISIS-TE], [BGP-LS] and
[BGP-PM], but these are not listed as such as references in the
references section. Please use consistent reference labels.
Thanks, cheers,
Brian

Qin Wu

2018-04-21 06:25:44 UTC

Permalink

Regarding timescale, I think we can refer to section 7 of RFC7810.
But I agree with your point on instantaneous bandwidth, I propose to remove available and utilized bandwidth from draft-ietf-alto-performance-metrics,
Hope this address your comment.

-Qin
-----邮件原件-----
发件人: Brian Trammell (IETF) [mailto:***@trammell.ch]
发送时间: 2018年4月13日 3:48
收件人: TSV Area Review Team
抄送: draft-ietf-alto-performance-***@ietf.org; ***@ietf.org; IETF Discussion Mailing List
主题: Re: Tsvart early review of draft-ietf-alto-performance-metrics-03

On further review, I have an additional question about the metrics specified in the document:

Max reservable (sec 8.1) and residue (sec 8.2) bandwidth seem to be fairly straightforward to calculate in a stable manner, as they are essentially matters of router configuration, but I'm a little concerned about the temporal aspects of available and utilized bandwidth. On any link containing congestion controlled traffic not otherwise limited (e.g. by a tunnel reservation), even a small number of active bulk transfer flows will cause the utilized bandwidth to max out on short timescales: you get an almost uselessly noisy metric.

I presume, since this metric is taken straight from an IS-IS standards track RFC, that this doesn't happen in practice, i.e. that the temporal denominator is large enough that it averages these smaller peaks out. It'd be nice if the document named a timescale though.

I'm also a little dubious that instantaneous bandwidth is a useful cost metric for a system like ALTO, because even on sane timescales the delay imparted by the system makes the information hard to act on in a useful way, unless the link is a large enough aggregate that the bandwidth doesn't change much over time.

Cheers,

Brian

Post by Brian Trammell
Reviewer: Brian Trammell
Review result: On the Right Track
I've performed a (late, apologies) early TSV-ART review of
draft-ietf-alto-performance-metrics-03.
The set of metrics chosen by the document seem broadly useful and
sane, and the integration into ALTO makes sense. However, there are a
few issues with the details.
Periodic One Way Delay, RTT, and PDV are defined in terms of section
8, section 4, and section 5, respectively, of
draft-ietf-ippm-initial-registry, which specify active measurement
test methodologies at layer 4 for one-way and round-trip delay using
UDP packets. This does not seem it can be measured directly using the
routing technologies the authors have identified as their source of
information. Is the intention that dedicated active measurement
hardware be used to measure delay using UDP packets, or should these metrics reference [RFC2679] and [RFC2681] and leave the methodology undefined, instead?
The examples for these don't make much sense: the units are expressed
in seconds, but Internet-scale delays are generally millisecond-scale,
and the examples given contain only integers. Similarly, packet loss
rate is given in percentile, but there are wide variations in
usability between a path with 0%, 1%, and 2% packet loss. Is this simply an issue with the examples?
The hop count metric is underspecified: are these IP-experienced hops
at layer 3, as can be measured by traceroute?
Nit: section 2.1 refers to [OSPF-TE], [ISIS-TE], [BGP-LS] and
[BGP-PM], but these are not listed as such as references in the
references section. Please use consistent reference labels.
Thanks, cheers,
Brian

Qin Wu

2018-04-21 06:14:45 UTC

Permalink

Brian:
Apologize for late reply. Thanks for valuable review.
-----邮件原件-----
发件人: Brian Trammell [mailto:***@trammell.ch]
发送时间: 2018年4月10日 0:36
收件人: tsv-***@ietf.org
抄送: ***@ietf.org; ***@ietf.org; draft-ietf-alto-performance-***@ietf.org
主题: Tsvart early review of draft-ietf-alto-performance-metrics-03

Reviewer: Brian Trammell
Review result: On the Right Track

I've performed a (late, apologies) early TSV-ART review of draft-ietf-alto-performance-metrics-03.

The set of metrics chosen by the document seem broadly useful and sane, and the integration into ALTO makes sense. However, there are a few issues with the details.

Periodic One Way Delay, RTT, and PDV are defined in terms of section 8, section 4, and section 5, respectively, of draft-ietf-ippm-initial-registry, which specify active measurement test methodologies at layer 4 for one-way and round-trip delay using UDP packets. This does not seem it can be measured directly using the routing technologies the authors have identified as their source of information. Is the intention that dedicated active measurement hardware be used to measure delay using UDP packets, or should these metrics reference [RFC2679] and [RFC2681] and leave the methodology undefined, instead?

[Qin]: I think measurement method is not limited to using routing technology as source of information, we also allow using other source of information, e.g., active measurement at layer 4, that allow an ALTO Server to retrieve and derive the necessary information to compute the metrics that we describe in this document.
I will make this clear in the text. Thanks.

The examples for these don't make much sense: the units are expressed in seconds, but Internet-scale delays are generally millisecond-scale, and the examples given contain only integers. Similarly, packet loss rate is given in percentile, but there are wide variations in usability between a path with 0%, 1%, and 2% packet loss. Is this simply an issue with the examples?
[Qin]: The reason to express units as seconds is to align with measurement unit defined in e.g.,one way delay metric defined in draft-ietf-ippm-initial-registry, I think you are right, we can redefine more fine granularity measurement unit in this document. For packet loss rate, this is just an example, we can provide an example that more make sense.

The hop count metric is underspecified: are these IP-experienced hops at layer 3, as can be measured by traceroute?
[Qin]: Yes, you are right, traceroute is one mechanism we can leverage to measure hop count, e.g., sending UDP probe message, I can add more detailed text o specify hop count.

Nit: section 2.1 refers to [OSPF-TE], [ISIS-TE], [BGP-LS] and [BGP-PM], but these are not listed as such as references in the references section. Please use consistent reference labels.

[Qin]: Fixed, thanks.

Thanks, cheers,

Brian