Learning Guide

Welcome to the AI Revolution

The field of Artificial Intelligence and Large Language Models is transforming every industry. Start your journey with curated resources across foundational papers, speech AI (STT/TTS/voice cloning), and coding agents.

Start Learning Explore Resources

Reddit Starter Kit

Join these communities to stay updated on the latest in AI and LLMs.

r/LocalLLaMA

Local Llama models, quantization, GPU tips, and routing.

Visit Subreddit

r/ClaudeAI

Claude usage, prompts, tool use, and best practices.

Visit Subreddit

r/OpenAI

APIs, fine-tuning, model updates, and ecosystem tools.

Visit Subreddit

r/ChatGPT

ChatGPT tips, plugins, agents, and user workflows.

Visit Subreddit

r/PromptEngineering

Prompt design patterns, evaluation, and safety guardrails.

Visit Subreddit

r/MachineLearning

Broader ML research, papers, and deep-dive discussions.

Visit Subreddit

Featured Learning Resources

The foundational papers and concepts that power modern AI systems.

Foundation - Transformers

Attention Is All You Need

The architecture that changed AI

Introduces self-attention, multi-head mechanisms, positional encodings, and the encoder-decoder stack that underpins modern LLMs.

NLP - Bidirectional

BERT

Bidirectional Encoder Representations

Masked language modeling and next-sentence prediction for strong bidirectional representations across core NLP tasks.

LLM - Autoregressive

GPT

Generative Pre-training and Scaling

Unsupervised pre-training + supervised fine-tuning; scaling laws and emergent capabilities in transformer language models.

Multimodal - Vision-Language

CLIP

Contrastive Language-Image Pre-training

Shared image-text representations enabling zero-shot classification and retrieval; a foundation for modern VLMs.

Efficient - Adaptation

LoRA

Low-Rank Adaptation of LLMs

Parameter-efficient fine-tuning via low-rank matrix updates - reduces trainable parameters while preserving performance.

Generative - Diffusion

Latent Diffusion Models

High-Resolution Image Synthesis

Text-to-image generation using diffusion in a compressed latent space - efficient, high-quality synthesis.

STT - Self-Supervised

wav2vec 2.0

Self-Supervised Speech Representations

Contrastive learning on raw audio for downstream ASR with minimal labeled data.

STT - Weak Supervision

Whisper

Robust ASR via Web-Scale Supervision

Large-scale multilingual speech recognition with strong zero-shot transcription and translation.

TTS - Neural Vocoder

WaveNet

Generative Raw-Waveform Audio

Autoregressive raw-audio synthesis that launched the neural vocoder era for natural TTS.

TTS - End-to-End

VITS

Variational + Adversarial TTS

Unified acoustic and vocoder network with CVAE and GAN training for natural speech.

Voice Cloning

SV2TTS

Speaker-Encoder + TTS Transfer

Few-shot voice cloning via transfer learning from speaker verification to multi-speaker TTS.

RAG - Knowledge Grounding

Retrieval-Augmented Generation

RAG for Knowledge-Intensive NLP

Combines parametric generation with non-parametric memory retrieval to improve factuality and updatability.

Agents - Reasoning + Acting

ReAct

Synergizing Reasoning and Acting

Interleaves chain-of-thought with tool actions, setting a practical pattern for modern agent loops.

Agents - Tool Use

Toolformer

Language Models Can Teach Themselves APIs

Shows how LMs can self-supervise tool/API usage and call external functions when useful.

Alignment - Preference Optimization

DPO

Direct Preference Optimization

Simple and effective alignment method that optimizes preferences without full RLHF complexity.

Reasoning - Prompting

Self-Consistency

Improves Chain-of-Thought Reliability

Samples diverse reasoning paths and aggregates answers for stronger reasoning performance.

STT, TTS, and Voice Cloning Stack

Practical models and projects for transcription, speech generation, and voice cloning in production systems.

TTS - Frontier

Qwen3-TTS

Instructional, Streaming, and Voice Cloning TTS

Modern open TTS family with expressive control, low latency streaming, and multilingual support.

TTS - Open Source

Chatterbox

High-Quality Open TTS for Voice Agents

Open-source speech synthesis models optimized for naturalness, low latency, and practical deployment workflows.

TTS - Multilingual Cloning

XTTS v2 (Coqui)

Multilingual Voice Cloning Toolkit

Widely used open toolkit for TTS and cloning workflows with training, inference, and serving options.

Voice Cloning - Cross-Lingual

OpenVoice

Flexible Voice Style Transfer

Voice cloning and style transfer pipeline designed for cross-lingual and personalized voice synthesis.

Voice Cloning - Editing

VoiceCraft

Token Infilling for Speech Editing

Model family for zero-shot speech editing and continuation with strong speaker preservation.

STT - Word Timestamps

WhisperX

Whisper + Alignment + Diarization

Popular extension for accurate word-level timestamps and optional speaker diarization workflows.

STT - Diarization

pyannote.audio

Speaker Segmentation and Attribution

Core toolkit for speaker diarization pipelines, frequently paired with ASR systems in production.

Speech Translation

SeamlessM4T

Multimodal Speech-to-Speech Translation

End-to-end speech and text translation stack for multilingual audio and speech communication tasks.

CLI Agents and Developer Workflows

Agentic coding tools you can use from terminal and IDE environments for code generation, refactoring, and review.

Coding Agent - CLI

OpenAI Codex CLI

Terminal-Native Agentic Coding

Open-source coding agent from OpenAI for editing code, running commands, and executing task loops locally.

Coding Agent - CLI

Claude Code CLI

Agentic Coding with Tool Use and MCP

CLI-first coding assistant with strong repo navigation, tool calling, and structured permission workflows.

Coding Agent - Open Source

Aider

Pair Programming in Git Repositories

Lightweight terminal assistant that works directly on tracked files and integrates tightly with git workflows.

Coding Agent - Open Source

OpenHands

General-Purpose Software Agent

Autonomous coding framework for issue resolution, repo work, and long-horizon software tasks.

IDE Agent

Cline

VS Code Coding Agent

In-editor coding agent with terminal execution, file operations, and iterative planning for implementation tasks.

Evaluation - Agents

SWE-bench

Bug-Fixing Benchmark for Coding Agents

Key benchmark for measuring practical coding-agent performance on real GitHub issues.

Courses & Reading Lists

University courses on agents, LLMs, and AI systems.

Course - Agents

Stanford CS329A

Self-Improving AI Agents

Stanford course focused on agentic AI with lectures and curated readings on reasoning, tools, and planning.

Course - Agents

Berkeley CS294/194-196

Agentic AI - Fall 2025

UC Berkeley course on agentic AI: frameworks, memory, evaluation, and long-horizon tasks.

Readings - LLM Systems

UCSD CSE-291A

Systems for LLMs & AI Agents

Reading list spanning LLM infrastructure, agents, retrieval, and evaluation.

Slides - Agents

Agentic AI - Lecture 1

UC Berkeley RDI (Slides)

Introductory slide deck on agent architectures and research directions.

Need Guidance?

Looking for personalized AI learning guidance or strategic consulting?

QNeura.ai

AI & Quantum Computing Consulting