The field of Artificial Intelligence and Large Language Models is transforming every industry. Start your journey with curated resources across foundational papers, speech AI (STT/TTS/voice cloning), and coding agents.
Join these communities to stay updated on the latest in AI and LLMs.
The foundational papers and concepts that power modern AI systems.
The architecture that changed AI
Introduces self-attention, multi-head mechanisms, positional encodings, and the encoder-decoder stack that underpins modern LLMs.
Bidirectional Encoder Representations
Masked language modeling and next-sentence prediction for strong bidirectional representations across core NLP tasks.
Generative Pre-training and Scaling
Unsupervised pre-training + supervised fine-tuning; scaling laws and emergent capabilities in transformer language models.
Contrastive Language-Image Pre-training
Shared image-text representations enabling zero-shot classification and retrieval; a foundation for modern VLMs.
Low-Rank Adaptation of LLMs
Parameter-efficient fine-tuning via low-rank matrix updates - reduces trainable parameters while preserving performance.
High-Resolution Image Synthesis
Text-to-image generation using diffusion in a compressed latent space - efficient, high-quality synthesis.
Self-Supervised Speech Representations
Contrastive learning on raw audio for downstream ASR with minimal labeled data.
Robust ASR via Web-Scale Supervision
Large-scale multilingual speech recognition with strong zero-shot transcription and translation.
Generative Raw-Waveform Audio
Autoregressive raw-audio synthesis that launched the neural vocoder era for natural TTS.
Variational + Adversarial TTS
Unified acoustic and vocoder network with CVAE and GAN training for natural speech.
Speaker-Encoder + TTS Transfer
Few-shot voice cloning via transfer learning from speaker verification to multi-speaker TTS.
RAG for Knowledge-Intensive NLP
Combines parametric generation with non-parametric memory retrieval to improve factuality and updatability.
Synergizing Reasoning and Acting
Interleaves chain-of-thought with tool actions, setting a practical pattern for modern agent loops.
Language Models Can Teach Themselves APIs
Shows how LMs can self-supervise tool/API usage and call external functions when useful.
Direct Preference Optimization
Simple and effective alignment method that optimizes preferences without full RLHF complexity.
Improves Chain-of-Thought Reliability
Samples diverse reasoning paths and aggregates answers for stronger reasoning performance.
Practical models and projects for transcription, speech generation, and voice cloning in production systems.
Instructional, Streaming, and Voice Cloning TTS
Modern open TTS family with expressive control, low latency streaming, and multilingual support.
High-Quality Open TTS for Voice Agents
Open-source speech synthesis models optimized for naturalness, low latency, and practical deployment workflows.
Multilingual Voice Cloning Toolkit
Widely used open toolkit for TTS and cloning workflows with training, inference, and serving options.
Flexible Voice Style Transfer
Voice cloning and style transfer pipeline designed for cross-lingual and personalized voice synthesis.
Token Infilling for Speech Editing
Model family for zero-shot speech editing and continuation with strong speaker preservation.
Whisper + Alignment + Diarization
Popular extension for accurate word-level timestamps and optional speaker diarization workflows.
Speaker Segmentation and Attribution
Core toolkit for speaker diarization pipelines, frequently paired with ASR systems in production.
Agentic coding tools you can use from terminal and IDE environments for code generation, refactoring, and review.
Terminal-Native Agentic Coding
Open-source coding agent from OpenAI for editing code, running commands, and executing task loops locally.
Agentic Coding with Tool Use and MCP
CLI-first coding assistant with strong repo navigation, tool calling, and structured permission workflows.
Pair Programming in Git Repositories
Lightweight terminal assistant that works directly on tracked files and integrates tightly with git workflows.
General-Purpose Software Agent
Autonomous coding framework for issue resolution, repo work, and long-horizon software tasks.
VS Code Coding Agent
In-editor coding agent with terminal execution, file operations, and iterative planning for implementation tasks.
Bug-Fixing Benchmark for Coding Agents
Key benchmark for measuring practical coding-agent performance on real GitHub issues.
University courses on agents, LLMs, and AI systems.
Self-Improving AI Agents
Stanford course focused on agentic AI with lectures and curated readings on reasoning, tools, and planning.
Agentic AI - Fall 2025
UC Berkeley course on agentic AI: frameworks, memory, evaluation, and long-horizon tasks.
Systems for LLMs & AI Agents
Reading list spanning LLM infrastructure, agents, retrieval, and evaluation.
UC Berkeley RDI (Slides)
Introductory slide deck on agent architectures and research directions.
Looking for personalized AI learning guidance or strategic consulting?