Portrait placeholder for Tao Yang

Bay Area, CA

Tao Yang

Applied AI Systems Builder

10+ years building production AI and ML systems from research prototypes to enterprise platforms, spanning AI agents, agentic workflows, context-aware automation, document intelligence, NLP, and applied ML.

Executive Summary

Turning AI agents and workflows into production enterprise systems.

I lead applied AI work across product, platform, and research, turning ambiguous AI opportunities into reliable production systems that can operate under real organizational, technical, and regulatory constraints.

My recent work focuses on AI agents, intelligent workflows, context-aware automation, document intelligence, and enterprise AI platforms for regulated domains, with emphasis on evaluation, human review, tenant-specific adaptation, and production delivery.

Earlier work spans healthcare conversational AI, semantic retrieval, medical NLP, knowledge graphs, and applied ML, giving me a long view of how language systems have evolved from classical NLP pipelines into LLM-native and agentic architectures.

Experience

Director

Taimei Technology · Pleasanton, CA

  • Lead applied AI strategy, platform development, and product incubation across clinical research and enterprise AI products, with focus on agentic workflows, document intelligence, and regulated knowledge work.
  • Build LLM-native and workflow-driven AI systems combining context-aware automation, document understanding, knowledge adaptation, evaluation pipelines, and human review for production environments.
  • Enable AI-assisted automation across protocol analysis, clinical data management, document intelligence, TFL/statistical programming, quality review, and enterprise knowledge work.
  • Lead cross-functional execution across research, engineering, product, and delivery teams, moving AI capabilities from exploratory prototypes into SaaS products and private-deployment solutions.

Tech Lead, Senior Research Scientist

Tencent America · Palo Alto, CA

  • Led applied research and product development for healthcare conversational AI, semantic QA, retrieval products, task bots, and semantic FAQ systems.
  • Built knowledge-grounded dialogue and retrieval workflows combining semantic search, knowledge graphs, machine reading, and structured medical content.
  • Developed practical NLP components for intent understanding, question interpretation, and retrieval-driven user journeys in healthcare scenarios.

Senior Research Scientist

Baidu US Research · Sunnyvale, CA

  • Worked on applied NLP, semantic retrieval, and knowledge-driven language understanding for Baidu's medical AI platform initiative.
  • Built modules for medical content understanding, question interpretation, ANN-based semantic search, and retrieval-driven QA workflows.
Core Expertise

Applied AI Systems

AI agents, agentic workflows, context-aware automation, document intelligence, evaluation systems

Knowledge & Language AI

NLP, semantic retrieval, knowledge graphs, medical language understanding, information extraction

Product & Platform

Enterprise AI platforms, AI-native applications, workflow automation, human-in-the-loop systems

Leadership

AI strategy, product incubation, team leadership, cross-functional execution, production delivery

Selected Projects

Representative AI systems and agentic workflows.

TLF Reporting Automation Agents

A LangGraph-based clinical reporting system that turns TLF shell specifications into generated SAS code and a reviewable reasoning trace. The workflow progressively loads ADaM metadata, macro knowledge, and reference programs only when needed, then combines generation, quality review, human feedback, and SAS runtime feedback into an iterative code repair loop for large reporting packages.

Read more

Trial Document Intelligence

eTMF document classification system that avoids the brute-force LLM pattern of sending every taxonomy rule plus the full document into one long prompt. The workflow projects noisy trial documents into rule-like semantic descriptions, retrieves a compact Top-K candidate set from tenant-specific rule vectors, and uses an LLM-as-a-judge step to make classification cheaper, faster, and more scalable.

Read more

Clinical Data Management Agents

Clinical data management AI system that supports study setup through multiple focused agents and workflows. The system combines LLM-based protocol understanding and CRF design support, traditional ML for reusable template mining, SFT and specialized SFT for DVP-to-edit-check generation, and DPO-based preference alignment with expert review.

Read more
Education
2011 - 2016

Arizona State University

  • PhD, Computer Science, 2013 - 2016
  • MS, Computer Science, 2011 - 2013
  • Visiting PhD Scholar, University of Michigan - Ann Arbor, 2015 - 2016
2007 - 2011

Beijing Jiaotong University

  • BEng, Software Engineering, 2007 - 2011
Publications (selected)
Multiplex Graph Neural Network for Extractive Text Summarization EMNLP 2021
Medical Triage Chatbot Diagnosis Improvement via Multi-relational Hyperbolic Graph Neural Network SIGIR 2021
Commonsense Evidence Generation and Injection in Reading Comprehension SIGDIAL 2020
On the Generation of Medical Question-Answer Pairs AAAI 2020
Multi-Grained Named Entity Recognition ACL 2019
Augmented LSTM Framework to Construct Medical Self-Diagnosis Android ICDM 2016
Absolute Fused Lasso and Its Application to Genome-Wide Association Studies KDD 2016
Simultaneous Feature and Feature Group Selection Through Hard Thresholding KDD 2014
Full list on Google Scholar
Patents (selected)
Dynamic rule-based document classification using generative semantic projection. CN Application, 2026.
Efficient and compact text matching system for sentence pairs. US Patent, 2024.
Automatic CRF generation method and device, electronic equipment and storage medium. CN Patent, 2022.
Framework for Chinese text error identification and correction. US Patent, 2022.
Proximity information retrieval boost method for medical knowledge question answering systems. US Patent, 2022.
Method and Apparatus for Medical Data Auto Collection Segmentation and Analysis Platform. US Patent, 2021.
Full list on Google Scholar