主題:Transfer Learning Under High-Dimensional Network Convolutional Regression Model高維網(wǎng)絡(luò)卷積回歸模型下的遷移學(xué)習(xí)
主講人:中國人民大學(xué)統(tǒng)計學(xué)院 黃丹陽教授
主持人:統(tǒng)計與數(shù)據(jù)科學(xué)學(xué)院 林華珍教授
時間:5月14日16:00-17:00
地點(diǎn):柳林校區(qū)弘遠(yuǎn)樓408會議室
主辦單位:統(tǒng)計與數(shù)據(jù)科學(xué)學(xué)院 科研處
主講人簡介:
黃丹陽,中國人民大學(xué)統(tǒng)計學(xué)院教授,吳玉章青年學(xué)者,國家治理大數(shù)據(jù)和人工智能創(chuàng)新平臺北京市消費(fèi)大數(shù)據(jù)監(jiān)測子實(shí)驗(yàn)室主任。主持國家自然科學(xué)基金面上項(xiàng)目、北京市社會科學(xué)基金重點(diǎn)項(xiàng)目等科研課題,入選北京市科協(xié)青年人才托舉工程,曾獲北京市優(yōu)秀人才培養(yǎng)資助。從事復(fù)雜網(wǎng)絡(luò)模型、大規(guī)模數(shù)據(jù)計算等方向的理論研究,關(guān)注統(tǒng)計理論在中小企業(yè)數(shù)字化發(fā)展中的應(yīng)用。研究成果三十余篇發(fā)表于JRSSB、JASA、JOE、JBES等權(quán)威期刊。獨(dú)著專著《大規(guī)模網(wǎng)絡(luò)數(shù)據(jù)分析與空間自回歸模型》入選“京東統(tǒng)計學(xué)圖書熱賣榜”。獲北京高校青年教師教學(xué)基本功比賽二等獎、最受學(xué)生歡迎獎等多項(xiàng)教學(xué)獎勵。
內(nèi)容提要:
Transfer learning enhances model performance by utilizing knowledge from related domains, particularly when labeled data is scarce. While existing research addresses transfer learning under various distribution shifts in independent settings, handling dependencies in networked data remains challenging. To address this challenge, we propose a high-dimensional transfer learning framework based on network convolutional regression (NCR), inspired by the success of graph convolutional networks (GCNs). The NCR model incorporates random network structure by allowing each node’s response to depend on its features and the aggregated features of its neighbors, capturing local dependencies effectively. Our methodology includes a two-step transfer learning algorithm that addresses domain shift between source and target networks, along with a source detection mechanism to identify informative domains. Theoretically, we analyze the lasso estimator in the context of a random graph based on the Erd?s–Rényi model assumption, demonstrating that transfer learning improves convergence rates when informative sources are present. Empirical evaluations, including simulations and a real-world application using Sina Weibo data, demonstrate substantial improvements in prediction accuracy, particularly when labeled data in the target domain is limited.
遷移學(xué)習(xí)通過利用相關(guān)領(lǐng)域的知識來提升模型性能,尤其是在標(biāo)注數(shù)據(jù)稀缺的情況下。盡管現(xiàn)有研究解決了獨(dú)立設(shè)置中各種分布變化下的遷移學(xué)習(xí)問題,但處理網(wǎng)絡(luò)化數(shù)據(jù)中的依賴關(guān)系仍具挑戰(zhàn)性。為應(yīng)對這一挑戰(zhàn),主講人提出一種基于網(wǎng)絡(luò)卷積回歸(NCR)的高維遷移學(xué)習(xí)框架,其靈感源自圖卷積網(wǎng)絡(luò)(GCN)的成功。NCR 模型通過允許每個節(jié)點(diǎn)的響應(yīng)取決于其特征及其鄰居的聚合特征來納入隨機(jī)網(wǎng)絡(luò)結(jié)構(gòu),從而有效地捕捉局部依賴關(guān)系。主講人的方法包括一個兩步遷移學(xué)習(xí)算法,用于解決源網(wǎng)絡(luò)和目標(biāo)網(wǎng)絡(luò)之間的領(lǐng)域偏移問題,以及一個源檢測機(jī)制來識別信息豐富的領(lǐng)域。從理論上講,我們在基于 Erd?s—Renyi 模型假設(shè)的隨機(jī)圖背景下分析了套索估計器,證明當(dāng)存在信息豐富的源時,遷移學(xué)習(xí)可提高收斂速度。包括模擬實(shí)驗(yàn)和使用新浪微博數(shù)據(jù)的真實(shí)世界應(yīng)用在內(nèi)的實(shí)證評估表明,在目標(biāo)領(lǐng)域標(biāo)注數(shù)據(jù)有限的情況下,預(yù)測準(zhǔn)確性有顯著提高。