|
TECHNICAL SPECIFICATION
5G;
Objective test methodologies for the evaluation of immersive
audio systems
(3GPP TS 26.260 version 15.0.0 Release 15)
---------------------- Page: 1 ----------------------
3GPP TS 26.260 version 15.0.0 Release 15 1 ETSI TS 126 260 V15.0.0 (2018-10)
Reference
RTS/TSGS-0426260vf00
Keywords
5G
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE
Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16
Siret N° 348 623 562 00017 - NAF 742 C
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° 7803/88
Important notice
The present document can be downloaded from:
The present document may be made available in electronic versions and/or in print. The content of any electronic and/or
print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any
existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the
print of the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
If you find errors in the present document, please send your comment to one of the following services:
Copyright Notification
No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying
and microfilm except as authorized by written permission of ETSI.
The content of the PDF version shall not be modified without the written authorization of ETSI.
The copyright and the foregoing restriction extend to reproduction in all media.
© ETSI 2018.
All rights reserved.
TM TM TM
DECT , PLUGTESTS , UMTS and the ETSI logo are trademarks of ETSI registered for the benefit of its Members.
TM TM
3GPP and LTE are trademarks of ETSI registered for the benefit of its Members and
of the 3GPP Organizational Partners.
oneM2M logo is protected for the benefit of its Members.
GSM and the GSM logo are trademarks registered and owned by the GSM Association.
ETSI
---------------------- Page: 2 ----------------------
3GPP TS 26.260 version 15.0.0 Release 15 2 ETSI TS 126 260 V15.0.0 (2018-10)
Intellectual Property Rights
Essential patents
IPRs essential or potentially essential to normative deliverables may have been declared to ETSI. The information
pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found
in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in
respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web
server (https://ipr.etsi.org/).
Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee
can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web
server) which are, or may be, or may become, essential to the present document.
Trademarks
The present document may include trademarks and/or tradenames which are asserted and/or registered by their owners.
ETSI claims no ownership of these except for any which are indicated as being the property of ETSI, and conveys no
right to use or reproduce any trademark and/or tradename. Mention of those trademarks in the present document does
not constitute an endorsement by ETSI of products, services or organizations associated with those trademarks.
Foreword
This Technical Specification (TS) has been produced by ETSI 3rd Generation Partnership Project (3GPP).
The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or
GSM identities. These should be interpreted as being references to the corresponding ETSI deliverables.
The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under
.
Modal verbs terminology
In the present document "shall", "shall not", "should", "should not", "may", "need not", "will", "will not", "can" and
"cannot" are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of
provisions).
"must" and "must not" are NOT allowed in ETSI deliverables except when used in direct citation.
ETSI
---------------------- Page: 3 ----------------------
3GPP TS 26.260 version 15.0.0 Release 15 3 ETSI TS 126 260 V15.0.0 (2018-10)
Contents
Intellectual Property Rights . 2
Foreword . 2
Modal verbs terminology . 2
Foreword . 4
Introduction . 4
1 Scope . 5
2 References . 5
3 Definitions, symbols and abbreviations . 5
3.1 Definitions . 5
3.2 Symbols . 5
3.3 Abbreviations . 6
4 Objective Test Methodologies for Immersive Audio Systems . 6
4.1 Objective Test Methodologies for Assessment of Immersive Audio Systems in the Sending Direction . 6
4.1.1 Diffuse-field Send Frequency Response for Scene-based Audio . 6
4.1.1.1 Introduction . 6
4.1.1.2 Definition . 6
4.1.1.3 Test method with periphonic array . 7
4.1.1.3.1 Test Conditions . 7
4.1.1.3.2 Measurement . 8
4.1.1.4 Test method with loudspeaker array and turn table . 9
4.1.1.4.1 Test Conditions . 9
4.1.1.4.2 Measurement . 9
4.1.2 Directional response measurement for scene-based audio . 10
4.1.2.1 Definition . 10
4.1.2.2 Test conditions . 10
4.1.2.3 Measurement . 10
4.2 Objective Test Methodologies for Assessment of Immersive Audio Systems in the Receiving Direction . 10
4.2.1 Headset Binaural Diffuse-field Receive frequency response for Scene-based audio . 10
4.2.1.1 Introduction . 10
4.2.1.2 Definition . 10
4.2.1.3 Test Conditions . 11
4.2.1.4 Measurement . 11
4.2.2 Nominal System Sensitivity in Receive Direction for Channel-based audio . 11
4.2.2.1 Introduction . 11
4.2.2.2 Definition . 12
4.2.2.3 Test Conditions . 12
4.2.2.4 Measurement . 12
4.2.3 Motion to Sound Latency in Dynamic Binaural Rendering Systems . 12
4.2.3.1 Introduction . 12
4.2.3.2 Requirements . 12
4.2.3.3 Calibration . 14
4.2.3.4 Evaluation Environment . 14
4.2.3.5 Data acquisition . 14
4.2.3.6 Data Analysis . 14
Annex A (normative): Order dependent directions . 17
Annex B (informative): Change history . 22
History . 23
ETSI
---------------------- Page: 4 ----------------------
3GPP TS 26.260 version 15.0.0 Release 15 4 ETSI TS 126 260 V15.0.0 (2018-10)
Foreword
This Technical Specification has been produced by the 3rd Generation Partnership Project (3GPP).
The contents of the present document are subject to continuing work within the TSG and may change following formal
TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an
identifying change of release date and an increase in version number as follows:
Version x.y.z
where:
x the first digit:
1 presented to TSG for information;
2 presented to TSG for approval;
3 or greater indicates TSG approved document under change control.
y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections,
updates, etc.
z the third digit is incremented when editorial only changes have been incorporated in the document.
Introduction
Audio is a key component of an immersive multimedia experience and 3GPP systems are expected to deliver immersive
audio with a high Quality of Experience. However, industry agreed methods to assess the Quality of Experience for
immersive audio are relatively few and the present document seeks to address this gap by providing objective test
methods for the assessment of immersive audio.
ETSI
---------------------- Page: 5 ----------------------
3GPP TS 26.260 version 15.0.0 Release 15 5 ETSI TS 126 260 V15.0.0 (2018-10)
1 Scope
The present document specifies objective test methodologies for 3GPP immersive audio systems including channel
based, object based, scene-based and hybrids of these formats. The subjective evaluation methods described in the
present document are applicable to audio capture, coding, transmission and rendering as indicated in their
corresponding clauses.
2 References
The following documents contain provisions which, through reference in this text, constitute provisions of the present
document.
- References are either specific (identified by date of publication, edition number, version number, etc.) or
non-specific.
- For a specific reference, subsequent revisions do not apply.
- For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including
a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same
Release as the present document.
[1] 3GPP TR 21.905: "Vocabulary for 3GPP Specifications".
[2] J. Fliege und U. Maier: "A two-stage approach for computing cubature formulae for the sphere,"
Dortmund University, 1999.
[3] ISO 3745 - Annex A: "Acoustics - Determination of sound power levels and sound energy levels
of noise sources using sound pressure -- Precision methods for anechoic rooms and hemi-anechoic
rooms - Annex A: General procedures for qualification of anechoic and hemi-anechoic rooms".
[4] ISO 1996 Acoustics: "Description, measurement and assessment of environmental noise".
[5] ANSI S1.4: "Specifications for Sound Level Meters".
[6] ISO 3: "Preferred numbers – Series of preferred numbers".
3 Definitions, symbols and abbreviations
3.1 Definitions
For the purposes of the present document, the terms and definitions given in 3GPP TR 21.905 [1] and the following
apply. A term defined in the present document takes precedence over the definition of the same term, if any, in 3GPP
TR 21.905 [1].
example: text used to clarify abstract rules by applying them literally.
3.2 Symbols
For the purposes of the present document, the following symbols apply:
LAeq the sound level in decibels equivalent to the total A-weighted sound energy measured over a stated
period of time.
ETSI
---------------------- Page: 6 ----------------------
3GPP TS 26.260 version 15.0.0 Release 15 6 ETSI TS 126 260 V15.0.0 (2018-10)
3.3 Abbreviations
For the purposes of the present document, the abbreviations given in 3GPP TR 21.905 [1] and the following apply. An
abbreviation defined in the present document takes precedence over the definition of the same abbreviation, if any, in
3GPP TR 21.905 [1].
4 Objective Test Methodologies for Immersive Audio
Systems
4.1 Objective Test Methodologies for Assessment of Immersive
Audio Systems in the Sending Direction
4.1.1 Diffuse-field Send Frequency Response for Scene-based Audio
4.1.1.1 Introduction
This test is applicable to UEs capturing scene-based audio (e.g. First and Higher Order Ambisonics).
NOTE: Currently, the test method uses a periphonic loudspeaker array for generation of a diffuse-field. Additional
loudspeaker setups for the derivation of the diffuse sound field are under consideration.
General test conditions
Free-field propagation conditions
- The test environment shall contain a free-field volume, wherein free-field sound propagation conditions shall be
observed.
- The free-field sound propagation conditions shall be observed down to a frequency of 200 Hz or less.
- Qualification of the free-field volume shall be performed using the method and limits for deviation from ideal
free-field conditions described in [3].
Test environment noise floor
rd
Within the free-field volume, the equivalent continuous sound level of the test environment in each 1/3 octave band,
L (f), shall be less than the limits of the NR10 curve, following the noise rating determination procedures in [4].
eq
4.1.1.2 Definition
The Diffuse-field Send Frequency Response for Scene-based Audio is defined as the transfer function, , between:
, the estimated sound pressure magnitude spectrum obtained from a diffuse-field scene-based audio capture
and reference synthesis at the geometric center of a free-field volume; and
), the sound pressure magnitude spectrum obtained from a diffuse-field microphone recording the same
diffuse field at the origin of a spherical coordinate system.
Figure 1 describes a typical block diagram for the scene-based audio sending direction with measurement points when
using a periphonic loudspeaker array.
ETSI
---------------------- Page: 7 ----------------------
3GPP TS 26.260 version 15.0.0 Release 15 7 ETSI TS 126 260 V15.0.0 (2018-10)
Figure 1: Scene-based audio capture block diagram for sending direction measurements
Definition of Equivalent Spatial Domain
th
The equivalent spatial domain representation, w(t), of a N order Ambisonics soundfield representation c(t) is obtained
2
by rendering c(t) to K virtual loudspeaker signals w (t), 1 ≤ j ≤ K, with K = (N+1) . The respective virtual loudspeaker
j
positions are expressed by means of a spherical coordinate system, where each position lies on the unit sphere, i.e., a
(N) (N) (N)
radius of 1. Hence, the positions can be equivalently expressed by order-dependent directions Ω =(θ , φ ), 1 ≤ j ≤
j j j
(N) (N)
K, where θ and φ denote the inclinations and azimuths, respectively. These directions are defined according to [2]
j j
and reproduced in Annex B for convenience.
The rendering of into the equivalent spatial domain can be formulated as a matrix multiplication:
(N,N) -1
w(t) = (Ψ ) ⋅c(t),
-1
where ⋅ (⋅) denotes the inversion.
(N,N)
(N)
The matrix Ψ of order N with respect to the order-dependent directions Ω is defined by:
j
(N,N) (N) (N) (N)
Ψ := [S S … S ],
1 2 K
with:
T
(N) 0 (N) -1 (N) 0 (N) 1 (N) 1 (N) N (N)
:= [S (Ω ) S (Ω ) S (Ω ) S (Ω ) S (Ω ) … S (Ω )] ,
j 0 j -1 j -1 j -1 j -1 j N j
m
where S (⋅) represents the real valued spherical harmonics of the order n and degree m.
n
(N,N)
The matrix Ψ is invertible so that the HOA representation c(t) can be converted back from the equivalent spatial
domain by:
(N,N)
c(t) = Ψ ·w(t)
4.1.1.3 Test method with periphonic array
4.1.1.3.1 Test Conditions
Periphonic loudspeaker array
a) A periphonic loudspeaker array shall be placed within the free-field volume with the geometric center of the
periphonic loudspeaker array coinciding with the geometric center of the free-field volume.
b) The periphonic loudspeaker array shall have a radius greater or equal than 1 meter.
ETSI
---------------------- Page: 8 ----------------------
3GPP TS 26.260 version 15.0.0 Release 15 8 ETSI TS 126 260 V15.0.0 (2018-10)
2
c) The periphonic loudspeaker array shall be composed of (N+1) coaxial loudspeaker elements. Each of the
2
(N+1) coaxial loudspeaker elements shall be equalized (if necessary) and level compensated to conform with the
operational room response curve limits given in [5] Section 8.3.4.1. N should be equal or greater than the
th
maximum ambisonics order supported by the device under test (DUT), e.g. N>=4 for a DUT supporting 4 order
Ambisonics capture.
2
d) The (N+1) coaxial loudspeaker elements shall be positioned according to the azimuth and elevation coordinates
given in Annex B.
e) All coaxial loudspeaker elements shall be oriented such that their acoustic axis intersects at the geometric center
of the free field volume.
f) The radius of each coaxial loudspeaker element shall be such that, at the geometric center of the free-field
volume, the far field approximation for the coaxial loudspeaker axial pressure amplitude decay holds true.
4.1.1.3.2 Measurement
Reference Spectrum measurement for periphonic loudspeaker array method
a) A diffuse-field / random incidence, or multi-field microphone is mounted in the free-field volume such that the
tip of the microphone corresponds to the geometric center of the free-field volume and the geometric center of
the periphonic loudspeaker array.
NOTE 1: Diffuse-field / random incidence microphones, are described in [5].
2 2
b) (N+1) decorrelated pink noise signals are played simultaneously over each of the (N+1) coaxial loudspeakers of
the periphonic loudspeaker array.
c) The playback level is adjusted such that the LAeq, measured over a 30s time window at the geometric center of
the periphonic loudspeaker array, is equal to 78dBSPL(A) ± 0.5dB.
d) The reference sound pressure at the geometric center of the free-field volume, p(t), is captured with the diffuse-
field or multi-field microphone.
th
e) The magnitude spectrum of the reference sound pressure, P(f), is calculated for the 1/12 octave intervals as
given by the R40 series of preferred numbers in [6].
th
NOTE 2: For ideal (calibrated) loudspeakers, the P(f) spectra should have equal energy in each 1/12 octave
intervals.
Estimated Spectrum measurement
a) The scene-based audio capture device under test is mounted in the free-field volume such that its geometric
center coincides with the geometric center of free-field volume and the geometric center of the periphonic
loudspeaker array.
2 2
b) (N+1) decorrelated pink noise signals are played simultaneously over each of the (N+1) coaxial loudspeakers of
the periphonic loudspeaker array. The pink noise signals shall be identical to the signals used for the reference
spectrum measurement.
c) The B-format scene-based audio format representation (compressed or uncompressed, depending on the use case
being tested) is stored for offline analysis.
d) The B-format scene-based audio format representation is uncompressed (if necessary) and converted to an
equivalent spatial domain representation of order N (B-Format to ESD conversion in Figure 1), where N
DUT DUT
corresponds to the Ambisonics order of the device under test.
e) ̂, the estimate of the sound field at the geometric center of the free-field volume and periphonic loudspeaker
array, is synthesized using the equivalent spatial domain representation of order N .
DUT
NOTE 3: ̂ can be taken from the W component of the B-Format signal, as an alternative to implementing the
B-Format to ESD conversion in step d).
thf) The magnitude spectrum of the estimated sound pressure, , is calculated for the 1/12 octave intervals as
given by the R40 series of preferred numbers in [6].
ETSI
---------------------- Page: 9 ----------------------
3GPP TS 26.260 version 15.0.0 Release 15 9 ETSI TS 126 260 V15.0.0 (2018-10)
Calculation of send frequency response for scene-based audio
The send frequency response for scene-based audio, G(f), is calculated as = .
4.1.1.4 Test method with loudspeaker array and turn table
4.1.1.4.1 Test Conditions
Loudspeaker array
a) A calibrated loudspeaker array shall be placed within the free-field volume.
b) The loudspeaker array shall comprise one or several semi-arcs having a radius greater or equal than 1 meter.
c) The loudspeaker array shall be composed of N+1 loudspeaker elements.
d) Each loudspeaker in the array shall be calibrated with a frequency response of [at least 100 Hz-20,000 Hz] and
minimum phase response.
e) The coordinates of the loudspeaker elements are defined according to a Gaussian spherical grid of order N.
Turn table
a) A turn table with a resolution of [0.5°] shall be used. The rotation axis of the turn table and the vertical axis of
the semi-arcs shall be aligned The turn table shall be adjusted in height so that the device under test is positioned
at the geometric center of the loudspeaker array.
b) For measurement, an azimuth step of 180/(N+1) degrees shall be used.
4.1.1.4.2 Measurement
Reference Spectrum measurement
a) A diffuse-field / random incidence, or multi-field microphone is mounted in the free-field volume such that the
tip of the microphone corresponds to the geometric center of the free-field volume and the geometric center of
loudspeaker array.
the
NOTE 1: Diffuse-field / random incidence microphones, are described in [5].
Repeat steps b-c) with an azimuth angular resolution of 180/(N+1) degrees:
b) An exponential sweep sine signal is played over each of the N+1 loudspeakers of the loudspeaker array.
c) The impulse response at the geometric center of the loudspeaker array is measured for each loudspeaker
position.
th
P(f), is calculated for the 1/12 octave intervals as
d) The magnitude spectrum of the reference sound pressure,
given by the R40 series of preferred numbers in [6].
th
NOTE 2: For ideal (calibrated) loudspeakers, the P(f) spectra should have equal energy in each 1/12 octave
intervals.
Estimated Spectrum measurement
a) The scene-based audio capture device under test is mounted in the free-field volume such that its geometric
free-field volume and the geometric center of the loudspeaker
center coincides with the geometric center of
array.
b) Repeat steps b-c) with an azimuth angular resolution of 180/(N+1) degrees::
c) An exponential sweep sine signal is played over each of the N+1 loudspeakers of the loudspeaker array. The
sweep signals shall be identical to the signals used for the reference spectrum measurement.
d) The impulse response at the geometric center of the loudspeaker array is measured for each loudspeaker
position.
ETSI
---------------------- Page: 10 ----------------------
3GPP TS 26.260 version 15.0.0 Release 15 10 ETSI TS 126 260 V15.0.0 (2018-10)
the) The magnitude spectrum of the estimated sound pressure, , is calculated for the 1/12 octave intervals as
given by the R40 series of preferred numbers in [6].
Calculation of send frequency response for scene-based audio
The send frequency response for scene-based audio, G(f), is calculated as = .
4.1.2 Directional response measurement for scene-based audio
4.1.2.1 Definition
The directional response for scene-based audio is defined as the transfer function, represented as an impulse response,
h(θ , φ ), between a device under test and a loudspeaker located at an equal distance r and L predefined directions,
i i
(θ , φ ), i=1,.,L.
i i
4.1.2.2 Test conditions
Free-field propagation conditions
- The test environment shall contain a free-field volume, wherein free-field sound propagation conditions shall be
observed.
- The free-field sound propagation conditions shall be observed down to a frequency of 200Hz.
Test environment noise floor
rd
The equivalent continuous sound level of the test environment in each 1/3 octave band, L (f), shall be less than the
eq
limits of the NR10 curve, following the noise rating determination procedures in [4].
Loudspeaker array
A real or simulated loudspeaker array comprising L loudspeakers located be a set of predefined directions (θ , φ ),
i i
i=1,.,L, from the geometric center of the loudspeaker array shall be used.
4.1.2.3 Measurement
For each loudspeaker position (θ , φ ), i=1,.,L , the following procedure shall be used:
i i
a) An exponential sweep sine test signal is played over the loudspeaker.
NOTE: The impact of codec on the exponential sweep sine test signal needs to be verified before performing the
measurements. An activation signal may be needed.
b) The impulse response h(θ , φ ) at the geometric center of the loudspeaker array is measured.
i i
4.2 Objective Test Methodologies for Assessment of Immersive
Audio Systems in the Receiving Direction
4.2.1 Headset Binaural Diffuse-field Receive frequency response for
Scene-based audio
4.2.1.1 Introduction
This test is applicable to UEs rendering scene-based audio (e.g. First and Higher Order Ambisonics) over a binaural
headset.
4.2.1.2 Definition
The Headset Binaural Diffuse-field Receive Frequency Response for Scene-based Audio (for left and right ears) is
defined as the transfer function, G (f), between:
L,R
ETSI
---------------------- Page: 11 ----------------------
3GPP TS 26.260 version 15.0.0 Release 15 11 ETSI TS 126 260 V15.0.0 (2018-10)
a) P (f), the binaurally recorded sound pressure magnitude spectra, obtained when a diffuse field signal in the
L,R
equivalent spatial domain representation, w(t), is played o
...