Sizhe Chen(陈思哲)

Biography

Hi! I am a CS Ph.D. student at UC Berkeley, where I am fortunately advised by Prof. David Wagner in Berkeley AI Research (BAIR). I am working closely with Chuan Guo (Meta FAIR), Nicholas Carlini (Anthropic), and Chawin Sitawari (Google DeepMind), supported by Meta- and Google-BAIR Commons. I got my M.Eng. (National Scholarship) and B.Eng. (Summa Cum Laude) from Shanghai Jiao Tong University advised by Prof. Xiaolin Huang, and also with Prof. Cihang Xie.

My research focuses on AI security in real-world applications. I am currently working on prompt injection defense (SecAlign, StruQ, Jatmo) for safely using LLMs in systems, e.g., as agents. Prompt injection attack is listed as the #1 threat to LLM-integrated applications, where the trusted system serves the trusted user when interactiving with the untrusted environment. To better answer the user instruction (in the prompt part), the system retrieves external data (documents, webpages, API returns, etc), which may contain injected instructions (Ignore previous instructions and …) trying to manipulate the system.

I am fortunate to have mentored or worked with lots of talented students: Yizhu Wang, Jing Qian, Shutong Wu, Zhixing Ye, Hend Alzahrani, and Zhengbao He. Feel free to drop me an email to connect! I accept approximation on my name’s pronunciation.

Invited Talks

  • Prompt Injection Defenses
    Guest Lecture at Generative AI: Foundations, Applications, and Safety (Duke) 2025
    UC Berkeley Security Seminar 2024
    Hong Kong Baptist University TMLR Young Scientist Seminar 2024
    Shanghai Jiao Tong University PAMI Group Seminar 2024
  • On the Learning Preference of Deep Neural Networks
    ICLR Oral Track 2023
    AI Time Youth Ph.D. Talk 2023
  • Subspace Adversarial Training
    CVPR Oral Track 2022
  • Adversarial Attacks and Defenses
    Northeastern University Security Seminar 2022

(Prompt Injection) Security of LLMs

  • SecAlign: Defending Against Prompt Injection with Preference Optimization
    Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wagner, Chuan Guo

    SecAlign aims at a prompt-injection-robust LLM that prefers (and thus output) the secure response over the insecure one. For this property, we build a preference dataset, where the “input” contains a prompt injection; the “desirable output” responds to the user instruction; and the “undesirable output” responds to the injected instruction. Preference optimization on this dataset reduces success rates of strong optimization-based prompt injections by a factor of >4 from StruQ.
  • StruQ: Defending Against Prompt Injection with Structured Queries
    Sizhe Chen, Julien Piet, Chawin Sitawarin, David Wagner

    StruQ is a general approach for prompt injection defense by separating the prompt (user instruction) and data into two channels. We design a secure front-end that formats a prompt and data into a special format, and a specially trained LLM that can produce high-quality outputs from these inputs. We augment the instruction tuning dataset with examples that contains a prompt injection, and do SFT on the model to ignore injections in data. StruQ reduces optimization-free attack success rates to <2%.
  • Jatmo: Prompt Injection Defense by Task-Specific Finetuning
    Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, David Wagner

    Jatmo defends against prompt injection by fine-tuning a base LLM on only one task using (data, output) samples. Without seeing any task instruction, the defended LLM has no instruction-following ability that enables it to follow injected instructions. 0% attack success rates are achieved against optimization-free attacks.

Security of Vision Models (Previous SoP)

  • One-Pixel Shortcut: On the Learning Preference of Deep Neural Networks
    Shutong Wu*, Sizhe Chen*, Cihang Xie, Xiaolin Huang

    OPS poisons model training by perturbing only one pixel in each image (taking few minutes) and robustly degrades the model accuracy on clean data to almost an untrained counterpart.
  • Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Attacks
    Sizhe Chen, Zhehao Huang, Qinghua Tao, Yingwen Wu, Cihang Xie, Xiaolin Huang

    AAA specifically defends against score-based query attacks by maintaining predictions while disrupting gradients, which offers a plug-in solution that preserves accuracy, calibration, and inference speed.
  • Universal Adversarial Attack on Attention and the Resulting Dataset DAmageNet
    Sizhe Chen, Zhengbao He, Chengjin Sun, Jie Yang, Xiaolin Huang

    AoA proposes that transfer attacks should seek for features that are shared across different architectures for common vulnerabilities, which produces DAmageNet, a 50K-sized dataset with >70% error rate on all tested models.
  • Subspace Adversarial Training
    Tao Li, Yingwen Wu, Sizhe Chen, Kun Fang, Xiaolin Huang

    Sub-AT approaches catastrophic overfitting and robust overfitting in adversarial training by constraining it in an extracted subspace, which enables 1-step Sub-AT to have a competitive performance vs. standard 10-step AT.

Services

  • Reviewer: CCS 2024/2025, SaTML 2025, NeurIPS 2023/2025, ICML 2024/2025, ICLR 2023/2024/2025, CVPR 2023/2024/2025, ICCV 2023, ECCV 2022/2024, IEEE TPAMI, Machine Learning, Pattern Recognition
  • UC Berkeley EECS Student Reviewer: Faculty Hiring Committee 2024, Ph.D. Admission Committee 2024, Equal Access to Application Assistance 2024

Awards

  • Research Fundings: Meta-BAIR Commons 2024-2026, Google-BAIR Commons 2024-2025, UC Berkeley EECS Departmental Fellowship 2023, NeurIPS 2022 / ICLR 2023 Travel Support
  • Degree Awards: SJTU Extraordinary Bachelor’s Thesis (1%, Summa Cum Laude equivalent) 2020, SJTU Outstanding Graduate 2022/2023
  • Scholarship: China National Scholarship (0.2%) 2021/2022, Kwang-Hua Scholarship 2019, Arawana Scholarship 2017

Misc

  • I practice neatness and minimalism.
  • I play table tennis, play badminton, and ski.
  • I love to sing, attend classic or pop concerts, and travel to large cities.
  • I write blogs (in Chinese yet) about my thoughts and experience.
  • My Erdös number is 3 due to my collaboration with Chuan Guo.