360 video is a form of virtual reality (VR) that allows the viewer to experience media content in an immersive fashion. In contrast to traditional video, 360 video is recorded with a special camera that captures the complete surroundings from almost all directions. Viewers consuming such a video can select the direction they are looking at by using a pointing device on a regular display or through head movement using a head-mounted device. This new format allows a viewer to change their viewing direction when watching the video, e.g., a viewer can watch a sporting event from multiple perspectives on the field. However, creating, storing, and disseminating 360 videos at a large scale over the Internet poses significant challenges. These challenges are the focus of this project, which will develop a new system and framework, called mi360World, to enable smooth delivery of and interaction with 360 video by any user on the Internet. This project, if successful, will significantly improve 360 video delivery, and will enable new and much richer educational, training and entertainment experiences. It will also help train a new class of multimedia systems researchers and practitioners.
The mechanisms for delivering a high-quality, personalized 360 video over the Internet to a globally distributed set of users is an unsolved scientific problem that entails the following challenges: 1) Ultra-high Bandwidth; 2) Ultra-low Delay; 3) View Adaptation (to user head movement); 4) Complex video metadata and delivery; 5) Video Quality of Experience (QoE). Traditional video QoE has seen extensive research over the years; however, what contributes to 360 video QoE is much less understood and will require conceiving of and measuring new metrics. The proposed mi360World system incorporates three major research thrusts to address the above challenges: A video creation thrust that enables personalized viewing by generating navigation graphs and cinematographic rules, while maintaining a high QoE and reducing cybersickness. The construction of navigation graphs and inclusion of cinematographic rules represent the main innovations of this project, and are encapsulated in a three-layered metadata representation of the 360 video: a transport layer, a semantic layer, and an interactive story-telling layer. The second thrust focuses on scalable distribution of 360 videos to a global set of diverse viewers, utilizing navigation graphs and cinematographic rules for highly efficient transition-predictive prefetching and caching. The third thrust focuses on QoE and has the goal of devising novel QoE metrics and evaluation methods to assess cybersickness. System architectures and algorithms will be extensively evaluated through simulation, emulation, and benchmarking using testbeds to assess the success of the proposed research.
- Camera pose estimation for virtual tourism. Immersive virtual tours based on 360 cameras, showing famous outdoor scenery, are becoming more and more desirable due to travel costs, pandemics, and other constraints. To feel immersive, a user must receive the view accurately corresponding to her position and orientation in the virtual space, and this requires cameras’ orientations to be known. Outdoor tour contexts have numerous, ultra-sparse cameras deployed across a wide area, making camera pose estimation challenging. We propose 360ViewPET, a novel strategy to automatically estimate the relative poses of ultra-sparse cameras (15 meters apart), with a mean error as low as 0.9 degree.
- View synthesis for virtual tourism. This project also serves immersive 360 video tourism but focuses on view generation. Some existing works use teleport and have poor immersion because the viewer can only travel to a position where a camera is installed; other existing works use 3D modeling, have great immersion but cannot cover a wide area because their cameras are densely deployed (centimeters or decimeters apart). We propose to use morphing for view synthesis, which achieves a good balance between good immersion and wide area coverage. Our experiments show that the intermediate views between two ultra-sparse cameras (15 meters apart) are synthesized with good viewing quality.
- Viewer management for 360 video live broadcast. 360 video is becoming an integral part of our content consumption through both video on demand and live broadcast services. However, live broadcast is still challenging due to the huge network bandwidth cost if all 360 views are delivered to a large viewer population over diverse networks. We propose 360BroadView, a viewer management approach to viewport prediction in 360 video live broadcast. It makes some high-bandwidth network viewers be leading viewers and help the others (lagging viewers) predict viewports during 360 video viewing and save bandwidth. Our evaluation shows that it maintains the leading viewer population at a minimal yet necessary level for 97% of the time despite viewer churns during live broadcast, such that the system keeps functioning properly.
A novel viewing mode for 360 video. A 360 video is often viewed in either Free Mode (the viewer manually controls her viewing direction) or Auto Mode (the server recommends a viewing direction at each moment). The drawback of Free Mode is that the viewer may miss the important events out of her sight; the drawback of Auto Mode is that the recommended viewing direction may not always be liked by every viewer due to viewers’ diverse opinions regarding what is important. We are using a machine learning strategy to automatically infer the percentage of viewers which will like the recommended viewing direction and enable Auto Mode only if the percentage is high. The work will be evaluated using both subjective and objective metrics. For subjective evaluation, we have developed an online user study platform (https://monet360video.web.illinois.edu/).
Better viewing trajectory recommendation in 360 video. When recommending a viewing trajectory, existing works choose the viewing direction with the highest saliency score at each moment and string them together to get a trajectory. This machine-generated trajectory is unlike a human-videographer-made one and is thus unnatural. In a human-videographer-made trajectory, the viewing direction does not always target the most salient object; instead, it keeps switching between different objects to tell a rich and interesting story. We are using a machine learning strategy to learn the object switching pattern from numerous conventional 2D videos and use it to guide the generation of recommended viewing trajectories in 360 video.
Qian Zhou, Zhe Yang, Hongpeng Guo, Beitong Tian, Klara Nahrstedt, “360BroadView: Viewer Management for Viewport Prediction in 360-Degree Video Live Broadcast”, ACM International Conference on Multimedia in Asia (MM Asia) 2022.
Qian Zhou, Klara Nahrstedt, “Ultra-Sparse 360-Degree Camera View Synthesis for Immersive Virtual Tourism”, IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR) 2022.
Qian Zhou, Bo Chen, Zhe Yang, Hongpeng Guo, Klara Nahrstedt, “360ViewPET: View-Based Pose EsTimation for Ultra-Sparse 360-Degree Cameras”, IEEE International Symposium on Multimedia (ISM) 2021, DOI: 10.1109/ISM52913.2021.00008
Ayush Sarkar, John Murray, Mallesham Dasari, Michael Zink, Klara Nahrstedt, “L3BOU: Low Latency, Low Bandwidth Optimized Super-Resolution Backhaul for 360-Degree Video Streaming”, IEEE International Symposium on Multimedia (ISM) 2021, December 2021, pp. 138-147, DOI: 10.1109/ISM52913.2021.00031 (Best Student Paper Award)
Ayush Sarkar, “Viewing Enhancement of 360 Videos in Diverse Contexts”, Master Thesis, University of Illinois, Urbana-Champaign, May 2022
- Jounsup Park and Klara Nahrstedt. 2019. Navigation Graph for Tiled Media Streaming. In Proceedings of the 27th ACM International Conference on Multimedia (MM ’19). Association for Computing Machinery, Nice, France, 447–455.
This research has been funded by the National Science Foundation Grant NSF 1900875.