| 5 | 1/1 | 返回列表 |
| 查看: 1375 | 回復(fù): 3 | |||
| 【懸賞金幣】回答本帖問題,作者he1wen2zhi將贈送您 5 個金幣 | |||
| 當前只顯示滿足指定條件的回帖,點擊這里查看本話題的所有回帖 | |||
he1wen2zhi新蟲 (初入文壇)
|
[求助]
大修20天,3個審稿人,求教大佬們我該怎么改 已有1人參與
|
||
|
求教大佬我該側(cè)重哪方面改 要加什么實驗?zāi)?br /> 第三個審稿人說我沒用自己提的數(shù)據(jù)集做實驗,實際上我論文中寫了我用了我的數(shù)據(jù)集做的訓練,我該怎么合理回復(fù)呢 求求大佬們給我些建議 三條審稿意見如下: Reviewer #1: This paper presents an audio-visual cross-modality generation method for talking face videos with rhythmic head. The studied topic is meaningful. The authors are suggested to further improve the paper from the following aspects. The quality evaluation of the generated audio-visual talking heads is very important for the method design. The authors have used some criteria for evaluation. The authors may give some discussions on whether it is possible to use some quality assessment methods for evaluation. For example using the audio-visual quality assessment methods proposed in 'Study of subjective and objective quality assessment of audio-visual signals', 'Attention-Guided Neural Networks for Full-Reference and No-Reference Audio-Visual Quality Assessment' for evaluation. The authors are suggested to give some discussions on this aspect and the above works. 'The proposed method demonstrates improved performance in terms of video quality compared to traditional approaches' Some discussions about visual quality assessment are suggested to be given here, considering that there are many visual quality assessment studies in the literatures, for example, 'Blind quality assessment based on pseudo-reference image', 'Blind image quality estimation via distortion aggravation', 'Unified blind quality assessment of compressed natural, graphic, and screen content images', 'Objective quality evaluation of dehazed images', 'Quality evaluation of image dehazing methods using synthetic hazy images'. Following the above comments, the quality assessment of multimedia signals is also highly relevant to this work, thus some surveys for quality assessment are suggested to be given in the introduction section of the paper, for example, 'Perceptual image quality assessment: a survey', 'Screen content quality assessment: overview, benchmark, and beyond'. Audio-visual attention is critical for various audio-visual applications. Many audio-visual attention prediction methods have been proposed, for example, 'A multimodal saliency model for videos with high audio-visual correspondence', 'Fixation prediction through multimodal analysis'. The authors may give some discussions on the possibility of using audio-visual attention prediction methods to improve the proposed method. The authors are suggested to give some discussions on this aspect and the above works. Reviewer #2: This paper addresses the generation of realistic talking facial videos by incorporating audio and head pose information. Existing methods lack natural head pose generation and audio synchronization, impacting video realism. The authors propose Flow2Flow, an autoregressive method that encodes audio and historical head poses using a multimodal transformer block with cross-attention. They introduce AVVS, a large-scale dataset for investigating rhythmic head movement patterns. The proposed method generates identity-independent facial motion representations, enabling photo-realistic videos with natural head poses and accurate lip-syncing, as demonstrated through experiments and comparisons with state-of-the-art approaches on public datasets. However, some concerns should be addressed. The organization of the paper could benefit from improvements, e.g., some video synthesis part is introduced in the feature encoding part. The authors pointed out that the full attention structure in the model excessively focuses on a single source during integration, leading to the neglect of crucial information from other modalities. As a result, accurately generating movements for the facial generation task becomes challenging. It would be helpful to provide supporting evidence or examples to further illustrate this issue. Instead of delving into the intricacies of flow theory, it would be more beneficial to focus on incorporating references in the facial attribute generation process. The model utilizes 15 neutral keypoints as facial attributes. It would be valuable for the authors to explore the impact of varying the number of keypoints and investigate whether incorporating certain 3DMM parameters and other types of audio features would enhance the results. The authors have primarily focused on discussing the applications of common loss functions. However, IQA models also have the wide-ranging applications in evaluating generative image, video, audio, and multimedia models, e.g., "Blind image quality assessment via cross-view consistency" and "Comparative perceptual assessment of visual signals using free energy features." The authors are suggested to give some discussions on this aspect and the above works. Additionally, considering the significance of attention mechanism, the authors are encouraged to provide discussions on related works like "Toward visual behavior and attention understanding for augmented 360-degree videos," "Viewing behavior supported visual saliency predictor for 360-degree videos," and "Learning a deep agent to predict head movement in 360-degree images." Reviewer #3: introductions: This paper proposes a normalizing flow based network to generate realistic talking face videos, by using audio and past head poses as inputs. Besides, they also contributes a solo-singing-themed audio-visual dataset called AVVS for research. Strength: 1. Experimental results do show that their methods can generate photo realistic videos with natural head poses and lip-syncing. And the performance looks good. 2. Utilizing normalizing flow model is novel and convincing. Weakness: 1. It is kind of stange that I do not see any experiments on AVVS dataset. Since you are proposing a dataset, I think some experiments should be conducted on it. |
超級版主 (文學泰斗)
No gains, no pains.
![]() |
專家經(jīng)驗: +21105 |
新蟲 (職業(yè)作家)
新蟲 (正式寫手)
| 最具人氣熱帖推薦 [查看全部] | 作者 | 回/看 | 最后發(fā)表 | |
|---|---|---|---|---|
|
[考研] 新疆大學地質(zhì)與礦業(yè)工程學院招生 +24 | another12 2026-03-04 | 32/1600 |
|
|---|---|---|---|---|
|
[考研] 一志愿山東大學,總分327,英語二79,有論文,有競賽,已過四六級 +3 | 木木目目1 2026-03-09 | 3/150 |
|
|
[考研] 320材料與化工,求調(diào)劑 +11 | 鶴遨予卿 2026-03-04 | 13/650 |
|
|
[考研] 0832食品科學與工程293調(diào)劑 +3 | 東東不東 2026-03-07 | 3/150 |
|
|
[考研] 材料與化工求調(diào)劑 +8 | 與冬清寧 2026-03-07 | 10/500 |
|
|
[考研] 材料調(diào)劑 +4 | xxxcm 2026-03-08 | 7/350 |
|
|
[考研] 298求調(diào)劑 +9 | fjj0912 2026-03-03 | 11/550 |
|
|
[考研] 0701-322 求調(diào)劑 +3 | jiliuxian 2026-03-06 | 8/400 |
|
|
[考研] 085600材料與化工 298 調(diào)劑 +11 | 小西笑嘻嘻 2026-03-03 | 11/550 |
|
|
[考研] 求調(diào)劑 一志愿蘇州大學,0856化工323分 | 本科應(yīng)化 | 有專利/競賽/科研助手經(jīng)歷 | +5 | 橙子cyx 2026-03-06 | 6/300 |
|
|
[考研] 282求調(diào)劑 +7 | 夕~日 2026-03-05 | 8/400 |
|
|
[考研] 271求調(diào)劑 +7 | 月色c 2026-03-05 | 8/400 |
|
|
[考研] 085701環(huán)境工程 求調(diào)劑 +7 | xiiiia 2026-03-04 | 7/350 |
|
|
[考研] 環(huán)境工程專碩307 一志愿211 四六級已過 求調(diào)劑 +5 | ccc! 2026-03-03 | 6/300 |
|
|
[考研] 267調(diào)劑求助 +5 | 聰少OZ 2026-03-04 | 5/250 |
|
|
[考研] 一志愿985材料與化工 326分求調(diào)劑 +3 | Hz795795 2026-03-04 | 3/150 |
|
|
[考研] 學碩材料275調(diào)劑 +9 | 路三三 2026-03-03 | 9/450 |
|
|
[考研] 331求調(diào)劑 +3 | zzZ&zZ 2026-03-03 | 3/150 |
|
|
[考研] 298求調(diào)劑 +3 | 人間唯你是清歡 2026-03-03 | 4/200 |
|
|
[考研] 292求調(diào)劑 +3 | sgbl 2026-03-03 | 3/150 |
|