miVirtualSeat: Semantics-aware Content Distribution for Immersive Meeting Environments – MONET: Multimedia Operating Systems and Networking

People

Klara Nahrstedt, University of Illinois, Urbana-Champaign (UIUC): Principle Investigator
Ramesh K Sitaraman, University of Massachusetts, Amherst (UMass); Principal Investigator
Michael H Zink, University of Massachusetts, Amherst (UMass): Principle Investigator
Jacob Chakareski, New Jersey Institute of Technology (NJIT): Principle Investigator

Research Assistants

Bo Chen (Postdoctoral Fellow) – UIUC
Qian Zhou (Postdoctoral Fellow) – UIUC
Mingyuan Wu (PhD Student) – UIUC
Shiv Tripedi (Undergraduate Student) – UIUC
Yuhan Lu (Undergraduate Student) – UIUC
John Murray (MS Student) – UMass
Siamak Beikzadeh (PhD Student) – UMass
Tianyu (Kevin) Chen (PhD Student) – UMass
Mohammad Reza Elahpour (PhD Student) – NJIT
Amirhoseing Aghaei (PhD Student) – NJIT
Simran Singh (Postdoctoral Fellow) – NJIT

Project Overview

The pandemic has greatly reinforced the need for virtual meetings in both work-related and social settings. It has also highlighted the dire lack of video conferencing tools that can simulate the rich immersive experience of in-person meetings, the current tools often leading to “zoom fatigue’’ caused by having to interact over an unnatural communication medium. This collaborative project brings together investigators from University of Illinois, Urbana-Champaign, University of Massachusetts, Amherst, and New Jersey Institute of Technology to research, build, and evaluate a distributed system, called miVirtualSeat, that more closely simulates the immersive experience of in-person meetings, including physical and virtual participants in a physical meeting space.

The project is focused on key research challenges in providing an immersive meeting experience where physical and virtual participants interact with each other in a physical meeting room. Some participants are virtually present in the physical meeting room, but physically located in remote sites with only limited compute and network resources. The challenges are (a) detecting, tracking, and localizing distributed physical and virtual 360-degree avatars and objects in the joint immersive scene in real-time, (b) reducing the bandwidth and latency of delivering integrated and synchronized 360-degree, volumetric, and 2D/3D video, and ambisonics audio, and (c) ensuring good quality-of-experience in the form of natural interaction between physical and virtual participants.

The project addresses an immediate and important need for a post-pandemic society to enable immersive hybrid meetings that arise in the context of classrooms, conferences, office meetings, and social gatherings. miVirtualSeat will enable these meetings with a physical meeting room and remote sites situated at each of the investigators’ institutions. The outcome of the project will be new undergraduate and graduate courses in the emerging field of advanced mixed reality immersive environments. Through outreach activities, the project members will showcase miVirtualSeat and expose the broader public to the capabilities of distributed AR/VR systems.

Augmented Scene Graph Generation for content analysis. One of the design goals of the semantics-aware teleconferencing system is to extract and analyze valuable contextual information in the physical meeting environment. Given mutiple sensor stream in the physical room as inputs, our system can perform ML-based scene analysis algorithms, real-timely detect and decompose in-room events of pre-defined categories into human, meeting-related objects and relationship between them. The compositional information can be formulated as a structured graph representation, where human and meeting-related objects are detected and localized as nodes and the relationship between human and objects are recorded and classified as edges between nodes. The graph representation contributes to a foveated rendering pipeline in our teleconferencing system, preserving video quality of important sub-regions and saving bandwidth. Moreover, semantics-aware system can potentially utilize the event information extracted from the joint immersive environment to benefit the virtual participant’s experience by switching camera views and setting automatic interactive sessions on selected events.

WebRTC/RTMP streaming investigation for content delivery. A cross-campus real-time video sreaming pipeline serves as a crucial submodule in this project. Real-Time Messaging Protocol (RTMP) and Web Real-Time Communication (WebRTC) solutions are investiagted and implemented. RTMP solution with an open-source media streaming software NGNIX. Upstreaming is done by FFmpeg or OBS, while done streaming is done by FFplay. The solution achieves an approximate latency around 700 ms when tested on servers with public IP. In the WebRTC streaming protocol solution, peers contact with Session Traversal Utilities for NAT (STUN) for public IP first. Afterwards, peers exchange Session Description Protocol (SDP) information within a connection to a signaling server before peer to peer connection is established. Peers would connect to a Traversal Using Relays around NAT (TRUN) server if the connection fails. The solution achieves an approximate latency around 400ms. More experiments on the latency in different network condition and different NAT types are expected in the future. More group meeting streaming designs are under exploration.

HoloLens2 investigations for content receiving (in progress). HoloLens2 is a pair of mixed reality smartglasses developed and manufactured by Microsoft. It is adopted as the major mixed reality product for providing a rich immersive meeting experience in this project.

The compatibility of webRTC on HoloLens 2 is under exploration and could potentially be achieved by the Unity integration of an open-source library, MixedReality-WebRTC. Besides the mixreality video streaming interface, real-time avatar rendering applications on HoloLens 2 are expected to be developed in this teleconferencing system.

Publications

Software

Software is being uploaded to https://github.com/ECE-ZINK/miVirtualSeat

Funding

This work was funded by the National Science Foundation NSF 1835834 and NSF 2106592.