Shuai Yuan

歡迎來到我的主頁。

我現在是臺灣國立清華大學 Large-scale System Architecture (LSA)實驗室的一員。
我現在在做erasure codes和cloud storage system方面的研究
我的導師是周志遠教授。

在此之前，我是：
» 法國巴黎第十一大學（南巴黎大學）LRI (Laboratoire de Recherche en Informatique)實驗室的Proval組的研究實習生。
我所做的題目是“資料庫約束自動化驗證”
我的導師是Véronique Benzaken和Évelyne Contejean教授。
» 浙江大學創新軟體研發中心(Eagle實驗室) 的VIPA組/浙江大學-微軟聯合實驗室的成員。
我所做的題目是“圖象場景音訊識別”
我的導師是宋明黎教授。

這裏是我的簡歷(html | pdf)。

這裏是我的部落格。

學術

研究領域 | 研究經歷

研究領域

雲端計算
存儲系統
抹除碼
約束檢查
機器學習

研究經歷

I joined Microsoft Visual Perception Laboratory of Zhejiang University in 2010 when I was a junior student. Under the direction of Prof. Mingli Song, I started reading papers in the wide area of Speech-driven facial animation, Speech emotion recognition, AED (Audio event detection), Music emotion recognition, Sound localization, Unstructured audio scene recognition and also Image inpainting and Image completion. Later, we combined the research work of image scene classification and auditory scene recognition, and develop an approach to recognize the scene sounds of images. That is, given an image, to find the environmental sounds that are fit to the scene of the image. Probabilistic Latent Semantic Analysis (pLSA) and Matching Pursuit (MP) algorithms are applied to extract the features of training images and sounds respectively. Then machine learning approach is used to recognize the corresponding environmental sounds for the specified image. In the training stage, For each image, pLSA is used to obtain its topic distribution P(z|d) while MP algorithm is used to get the first 10 Gabor atom reconstructed signal for each environmental sound. We joint topic distribution P(z|d) of image with the reconstructed signal of the corresponding audio to obtain the vectors of mixed feature of the training pair of images and sounds. And then we calculate the cluster indices of each vector of mixed features and the centroid locations of the clusters. In the testing stage, For an input image, we mix its topic distribution P(z|d) with the reconstructed signal of different audio to get different vectors of mixed features. Then we compare this set of testing mixed features with the centroid vector of training clusters and obtain the category of the test mixed feature vector which is most similar to the centroid vector of a certain training cluster. The target audio are the sounds in the same category.

During my stay at Proval Group as a research intern, I was working on my bachelor's thesis "automated constraint verification for databases" under the guidance of Véronique Benzaken and Évelyne Contejean. Our motivation of this thesis is from the observation that currently no real DBMS (database management system) have fully support the management of integrity constraints. Instead, they use triggers as an alternative. However, the behavior of triggers is complex and the semantics of them are hard to understand. We present a strategy to automatically verify the integrity constraints of databases. Our method is based on the weakest precondition and predicate transformer approaches. First we reduce database integrity constraints in SQL into SQL assertions, and then transfer assertions into FOL (first-order logic) formula. Based on the logical formalization of both SQL assertions and data modification operations, we implement integrity constraints checking for databases with the help of the program verification platform Why3. For the input SQL statements, our program translate them into WhyML program, later Why3 is called to compute the weakest preconditions and generate the verification conditions for the back-end provers (such as Alt-Ergo, CVC3, etc.). Finally the provers will check whether the databases after executing the data modification operations satisfy the constraints. All the process is fully automatic. My bachelor's thesis now is available in Chinese and the English slides are also available. The source code is opened under the GPLv3. (View assertion-verification on GitHub)

Now I am working on issues related to cloud storage system, and erasure codes. Recently, I implemented a GPGPU approach to accelerate Reed-Solomon coding. Source code and some documents are available. (View GPU-RSCode on GitHub)

技術

程式設計技術 | 工具 | 項目經歷

技能樹

常規技：輸出平穩，但裝備大衆。
C, C++, Java（做軟體開發的大衆語言）
Verilog HDL（搞VLSI的大衆語言）
Matlab/Octave（玩科學運算的大衆語言）
半吊子技能：
Ocaml（在法國以外的地區是小衆語言，偶接觸的第一種函數程式語言XD）
附加技能：在歷史戰績裏還只是被用來打打小怪練練級而已。
Python（寫testcase、socket程式，粗略玩過Pygame）
Markdown, HTML Markup.
高階技：
- GPGPU: CUDA, OpenCL
- 並行運算：Hadoop, MPI
- 圖形：OpenGL
- 資料庫：會SQL~~（本科就是靠這個畢業的敢不會麼）~~，用過PostgreSQL、Mysql等DBMS
- 網頁開發：對後端較熟，寫過JSP/Servlet
計劃要學習的技能樹新分支： Ruby, LISP, Haskel, 等等。

利器

工欲善其事，必先利其器
— 《論語·衛靈公》

以下這些是我用過的“利器”：

版本控制：git偶最喜歡啦，偶之前也用過svn甚至~~老掉牙~~的cvs。
編輯器/整合開發工具：VIM忠實粉一枚。IDE嘛，以前用過Eclipse, Visual Studio跟Xilinx ISE。
偶現在很少用IDE啦，一般是用GNU Make生成可執行檔案。不過偶更喜歡用的是autoconf和automake。
GNU/Linux下的折騰黨（目前是Archlinux粉），會bash指令碼，會sed（VIMer升級必備呵）和awk。
文字編輯相關：偶用LaTeX寫文件，用Graphviz DOT來生成圖片~

項目經歷

Talk is cheap. Show me the code.
— Linus Torvalds

參見：

興趣

音樂 | 運動

音樂

偶素“古粉”，全稱叫“古典音樂粉”。目前狀態是：隨性涉獵，再重口味的現代音樂偶也不排斥；不甚精通，某些「大俗」作品偶也可能不識。
這裏有一篇我很早前寫的“樂評”：Innovation and Conservation -- Classical Music of the UK，大致是講我那時候對不列顛音樂的理解（受限於文筆拙劣，見識有限，建議專業人士慎入XD）。
我自己也是個小提琴的愛好者和~~演奏者~~學習者。曾走過很長的彎路，後幸獲明師指點，在技巧和演繹方面都有了突飛猛進的進步。不過如今都是自己胡亂摸索，技巧上幾已停滯不前，寄望於深入理解和更好表現所能演奏的音樂，然此又絕非一載之功，唯有自娛自樂耳。

運動

我最喜歡的運動是羽球和桌球，我也喜歡看網球和籃球比賽——不過不會打~

電子郵箱

yszheda AT gmail DOT com

電話

0988473989/(886)988473989

通訊地址

30013 臺灣新竹市光復路2段101號國立清華大學資電館836

工作地址

View EECS, NTHU in a larger map

This HTML5 website is adapted from the webpage of Denis Cousineau. The original template is from WebDesignerAid.com. The source is available under GPL licence.