User Guide
Progressive guide from first launch to advanced usage.
Table of Contents
- Level 1: Getting Started
- Level 2: Daily Use Basics
- Level 3: Choosing the Right ASR Backend
- Level 4: AI Enhancement
- Level 5: Preview Power Features
- Level 6: Direct Mode & Streaming
- Level 7: Clipboard Enhancement
- Level 8: Custom Enhancement Modes
- Level 9: Multi-Provider Setup
- Level 10: Vocabulary & Conversation History
- Level 11: Launcher
- Level 12: Fine-Tuning & Troubleshooting
- What's Next?
A progressive guide from first launch to advanced usage. Follow the levels in order — each builds on the previous one.
Level 1: Getting Started
Goal: Install 闻字 and transcribe your first sentence.
Install
Option A — Download Release (easiest):
- Download
WenZi.appfrom the Releases page. - Drag it to
/Applications. - Double-click to launch.
First launch: macOS blocks unsigned apps. Go to System Settings → Privacy & Security, find the 闻字 blocked message, and click Open Anyway.
Option B — Build from Source:
git clone https://github.com/Airead/WenZi
cd WenZi
uv sync
./scripts/build.sh # builds WenZi.app in dist/
Option C — Run from Source (for developers):
git clone https://github.com/Airead/WenZi
cd WenZi
uv sync
uv run python -m wenzi
Grant Permissions
On first launch, macOS will ask for:
| Permission | Why |
|---|---|
| Microphone | Record your voice |
| Accessibility | Type text into other apps |
| Speech Recognition | Only needed for Apple Speech backend |
Grant all requested permissions in System Settings → Privacy & Security.
First Launch: Ready Immediately
The default ASR backend is Apple On-Device Speech — it uses the built-in macOS speech recognizer, so no model download is needed. 闻字 is ready to transcribe right after granting permissions.
Note: If you later switch to FunASR or MLX-Whisper in Settings, 闻字 will need to download a model (~75 MB to ~1.6 GB depending on the model). During download:
- The menubar icon changes to a download icon (⬇) with a percentage like
DL 45%- Please wait for the download to complete before trying to transcribe
- Click menubar → View Logs... to open the built-in log viewer and monitor download progress in real time
- Once loading finishes, the icon changes back to a microphone icon (🎙) and the status shows "Ready"
Tip: If a download fails or is interrupted, delete the cache directory (
~/.cache/modelscope/for FunASR,~/.cache/huggingface/for MLX-Whisper) and restart 闻字 to retry.
Your First Transcription
- Look for the microphone icon (🎙) in the menubar — that means 闻字 is ready.
- Open any text input (Notes, browser, editor, terminal…).
- Hold the
fnkey and speak. - Release
fn— the transcribed text appears.
That's it! You've completed the basic workflow.
Understanding the Menubar Icon
The menubar icon changes to reflect the current status:
| Icon | Status | Meaning |
|---|---|---|
| 🎙 (mic.fill) | Ready | Idle, ready to record |
| 〰 (waveform) | Recording... | Capturing audio |
| 💬 (text.bubble) | Transcribing... | Processing speech to text |
| ✨ (sparkles) | Enhancing... | AI enhancement in progress |
| 👁 (eye) | Preview... | Preview panel is open |
| ⬇ (arrow.down.circle) + DL X% | Downloading... | Model download in progress |
| ⚙ (cpu) | Loading... | Loading model into memory |
| ⚠ (triangle) | Error | Something went wrong |
Level 2: Daily Use Basics
Goal: Understand the two output modes and basic menubar controls.
Preview Mode vs Direct Mode
闻字 has two ways to deliver results:
| Mode | Behavior | When to use |
|---|---|---|
| Preview (default) | Shows a floating panel — review and edit before confirming | When accuracy matters, or you want to check before typing |
| Direct | Types text immediately into the active app | When speed matters and you trust the transcription |
Toggle via: menubar → Settings... → General tab → Preview checkbox.
Preview Panel Basics
When Preview is on, after recording you'll see a floating panel:
- Confirm (
Enter) — types the text and closes the panel - Copy to clipboard (
⌘+Enter) — copies the text to clipboard instead of typing - Cancel (
Esc) — discards the text - Edit — click the text area to modify before confirming
Menubar Overview
Click the microphone icon in the menubar to see the menu:
🎙
├── Ready (status indicator)
├── ─────────────────────
├── Enhance Clipboard AI-enhance selected text (Ctrl+Cmd+V)
├── Browse History... Search and browse past transcriptions
├── Settings... Open the settings panel (4 tabs)
├── ─────────────────────
├── View Logs... Open log viewer
├── Usage Stats View usage statistics
├── About 闻字 Version info
└── Quit
All model selection, AI enhancement configuration, and hotkey management are done through the Settings panel — not the menubar menu directly.
Recording Feedback
While holding fn, a floating indicator with audio level bars shows you're recording. A sound plays on start and stop (configurable in Settings → General).
When the ASR backend supports streaming (e.g., Apple Speech), a live transcription overlay appears below the recording indicator, showing partial transcription text in real time as you speak. This gives you immediate visual feedback without waiting for the recording to end.
Recording Controls
While holding the recording hotkey, you can press additional keys to control the session:
Key (while holding fn) |
Action |
|---|---|
Cmd (default) |
Restart recording — discards current audio and starts a new recording |
Space (default) |
Cancel recording — discards audio and returns to idle |
Z |
Show last preview — cancels recording and opens the last preview result |
The restart and cancel keys can be customized in Settings → General (or via config: feedback.restart_key and feedback.cancel_key).
Level 3: Choosing the Right ASR Backend
Goal: Pick the best speech engine for your needs.
Backend Comparison
| Backend | Language | Speed | Accuracy | Download Size |
|---|---|---|---|---|
| Apple Speech (default) | Multiple | Fast | Good | None (built-in) |
| FunASR | Chinese | Fast | High (Chinese) | ~500 MB |
| MLX-Whisper | 99 languages | Medium | High | 75 MB – 1.6 GB |
| Whisper API | Multiple | Depends on network | High | None (cloud) |
How to Switch
Open Settings... → STT tab. You'll see:
- Local section: Radio buttons for all available local ASR presets (FunASR, MLX-Whisper variants, Apple Speech)
- Remote section: Cloud-based ASR providers you've configured
Click a radio button to switch. The model will start loading (or downloading if not yet cached).
First download reminder: When switching to a new MLX-Whisper model for the first time, the model needs to download. Watch the menubar icon for
DL X%progress. Model sizes range from ~75 MB (tiny) to ~1.6 GB (large-v3-turbo).
Recommendations
- Zero-setup, any language → Apple Speech (default — no download, supports real-time streaming)
- Chinese only → FunASR (best accuracy for Chinese, fully offline)
- English or multilingual, high accuracy → MLX-Whisper small or large-v3-turbo
- Best accuracy, don't mind latency → Whisper API via Groq (free tier available)
Level 4: AI Enhancement
Goal: Use an LLM to proofread, translate, or reformat transcribed text.
AI enhancement is optional — by default it's off. When enabled, your transcribed text is sent to an LLM for post-processing before output.
Step 1: Set Up an LLM Provider
You need an LLM backend. Two easy options:
Option A — Local with Ollama (free, private):
- Install Ollama and run
ollama pull qwen2.5:7b - That's it — 闻字.s default config points to Ollama
Option B — Cloud API (e.g., DeepSeek, OpenAI):
- Open Settings... → LLM tab → Add Provider...
- Fill in provider details:
name: deepseek
base_url: https://api.deepseek.com/v1
api_key: sk-your-key
models:
deepseek-chat
- Click Verify → Save
Step 2: Select an Enhancement Mode
Open Settings... → AI tab. Select a mode:
| Mode | What it does |
|---|---|
| Off | No enhancement (raw transcription) |
| 纠错润色 (Proofread) | Fix typos, grammar, punctuation |
| 翻译为英文 (Translate EN) | Translate to English |
| 命令行大神 (Commandline) | Convert speech to shell commands |
Step 3: Try It
- Make sure an LLM provider is configured and a mode is selected.
- Hold
fn, say something, release. - The result now goes through the LLM before appearing.
Tip: Start with "纠错润色" (Proofread) — it's the most universally useful mode.
Level 5: Preview Power Features
Goal: Master the preview panel's editing and switching capabilities.
With Preview mode on and AI enhancement active, the preview panel becomes a powerful editor.
Quick Mode Switching
Press ⌘1 through ⌘9 to instantly switch enhancement modes and re-process the same audio:
⌘1= first mode in list (e.g., Proofread)⌘2= second mode (e.g., Translate EN)⌘3= third mode (e.g., Commandline)- …and so on for custom modes
Result Caching
When you switch modes in the preview panel, 闻字 caches completed results. Switching back to a previously used mode shows the cached result instantly (marked [cached]) — no API call needed.
The cache is cleared when new audio is recorded.
Preview History
闻字 keeps an in-memory history of your last 10 preview results (cleared on app restart). This lets you go back to a previous transcription without re-recording.
- History dropdown: Click the clock icon in the preview panel's toolbar to open a dropdown showing recent previews. Select one to reload it into the panel.
- Quick recall: Press
fn+Zat any time (even outside the preview panel) to cancel any active recording and instantly open the most recent preview result.
Web Preview Panel
The preview panel uses a modern WKWebView-based (HTML/CSS/JS) interface by default, providing a polished look with dark mode support. You can switch between the web-based and native AppKit preview in Settings → General → Web Preview toggle.
Other Preview Features
| Feature | How |
|---|---|
| Edit text | Click the text area and type |
| Copy to clipboard | ⌘+Enter — copies instead of typing into the active app |
| Toggle punctuation | Check/uncheck the Punc checkbox to re-transcribe with/without punctuation |
| Switch STT model | Use the STT dropdown in the panel |
| Switch LLM model | Use the LLM dropdown in the panel |
| Play audio | Click the play button to hear the recording |
| Save audio | Click save to export the recording as a file |
| Google Translate | Click the translate button to open Google Translate with current text |
Level 6: Direct Mode & Streaming
Goal: Use 闻字 for fast, hands-free input with real-time AI feedback.
Enable Direct Mode
Turn off Preview: Settings... → General tab → uncheck Preview.
Now when you release the hotkey, text is typed directly into the active app — no panel, no confirmation needed.
Real-Time Streaming STT
When using an ASR backend that supports streaming (currently Apple Speech), 闻字 shows a live transcription overlay during recording. Partial text appears in real-time as you speak, giving you instant feedback before you even release the hotkey.
This works in both Preview and Direct modes. In Direct mode, it is especially useful because you can see the transcription forming and decide whether to keep or cancel it.
AI Streaming Overlay
In direct mode, after recording ends, a streaming overlay appears showing the processing pipeline:
- Transcription phase — the overlay first shows the ASR result (or streams partial text if the backend supports it)
- Enhancement phase (if AI enhancement is active) — the LLM processes the text in real-time, with tokens appearing as they are generated
Controls during the overlay:
- Press Esc to cancel transcription/enhancement and discard the result
- The overlay shows token count and processing status
- Once complete, the final text is typed automatically
When to Use Direct Mode
- Chat apps where speed matters
- Terminal / command line input
- Any workflow where you trust the AI output and don't need to review
Level 7: Clipboard Enhancement
Goal: AI-enhance any text in any app, not just speech transcriptions.
How It Works
- Select text in any application.
- Press
Ctrl+Cmd+V(default hotkey). - 闻字 copies the selection, sends it to the LLM with the current enhancement mode, and outputs the result.
You can also trigger it from the menubar: click Enhance Clipboard.
Use Cases
- Select a rough draft → enhance with Proofread mode
- Select Chinese text → translate to English
- Select a task description → convert to shell command
Output Behavior
- Preview on: Result appears in the preview panel for review
- Preview off: Result replaces via clipboard
Customize the Hotkey
Edit ~/.config/WenZi/config.json:
{
"clipboard_enhance": {
"hotkey": "ctrl+cmd+v"
}
}
The hotkey format is modifier+modifier+key. See Level 12 for format details and examples.
Level 8: Custom Enhancement Modes
Goal: Create your own AI modes and chain pipelines.
Create a Custom Mode
Via Settings (easy):
- Open Settings... → AI tab → Add Mode...
- Edit the template, click Save, enter a mode ID.
Via file (flexible):
Create a .md file in ~/.config/WenZi/enhance_modes/:
---
label: Formal Email
order: 60
---
You are a professional email writing assistant.
Rewrite the user's input as a formal, polished email body.
Use appropriate greetings and closings if context suggests an email.
Maintain the original intent and key information.
Output only the email text without any explanation.
The filename (without .md) becomes the mode ID. Restart to load.
Create a Chain Mode
Chain modes run multiple steps sequentially:
---
label: 润色+翻译EN
order: 25
steps: proofread, translate_en
---
First proofreads the text, then translates to English.
(This body is documentation only — each step uses its own prompt.)
Tips for Good Prompts
- Be specific about what to do AND what NOT to do
- End with "Output only the processed text without any explanation"
- Use
ordervalues with gaps (10, 20, 30…) so you can insert modes between them
See Enhancement Mode Examples for ready-to-use templates covering email, meeting notes, translation, developer tools, and more.
Level 9: Multi-Provider Setup
Goal: Configure multiple ASR and LLM providers and switch between them.
Why Multiple Providers?
- Use a fast local model (Ollama) for simple tasks, cloud API for complex ones
- Have a backup when one provider is down
- Compare results across different models
Add Providers via Settings
LLM providers: Settings → LLM tab → Add Provider...
ASR providers: Settings → STT tab → Add Provider...
Both use the same dialog format:
name: provider-name
base_url: https://api.example.com/v1
api_key: your-key
models:
model-1
model-2
Switch at Runtime
In the Settings panel, all configured models appear as radio buttons. Click to switch — no restart needed.
In Preview Panel
You can also switch LLM and STT models directly from the preview panel's dropdowns, making it easy to compare results from different models on the same audio.
See Provider & Model Setup Guide for detailed examples covering Ollama, OpenAI, DeepSeek, Groq, OpenRouter, Qwen, and more.
Level 10: Vocabulary & Conversation History
Goal: Teach 闻字 your personal terms and maintain topic context across turns.
Vocabulary Retrieval
Problem: ASR often misrecognizes proper nouns, technical terms, and names (e.g., "萍萍" → "平平").
Solution: 闻字 builds a personal vocabulary from your correction history and uses it to improve future results.
How to Build Vocabulary
- Use Preview mode with AI enhancement — edit the result when the AI gets a term wrong.
- Each edit is logged to
~/.config/WenZi/conversation_history.jsonlwith auser_correctedflag. - Auto build (default): After every 10 corrections, vocabulary is rebuilt automatically in the background.
- Manual build: Settings → AI tab → Build Vocabulary...
Enable Vocabulary
Settings → AI tab → toggle Vocabulary (N). The number shows how many entries are indexed.
When enabled, relevant vocabulary entries are retrieved via embedding similarity and injected into the LLM prompt, helping it correct domain-specific terms.
Conversation History
Problem: Each transcription is independent — the LLM doesn't know what you just said.
Solution: 闻字 injects recent confirmed outputs into the AI prompt, so the LLM understands the current topic.
Enable
Settings → AI tab → toggle Conversation History.
How It Works
- Only preview-confirmed records are used (ensuring quality)
- Recent entries are formatted efficiently with arrow notation for corrections
- The LLM uses this context to maintain consistency (e.g., always using the correct name spelling)
Browse History
Menubar → Browse History... opens a full-featured history browser with:
- Text search — search across all transcription text fields
- Tag filters — click tag pills to filter by enhance mode (proofread, translate, etc.), STT model, LLM model, or whether corrections were made
- Time range filtering — filter by today, last 7 days, last 30 days, or all time
- Record deletion — select a record and click Delete to remove it
- Edit and save — modify the final text of any record and save changes
- Archived records — check the "Archived" toggle to include records from monthly archives
Auto-Rotation and Archiving
When conversation history exceeds 20,000 records, 闻字 automatically archives older records into monthly files under ~/.config/WenZi/conversation_history_archives/YYYY-MM.jsonl. The main history file keeps the most recent 20,000 records for fast access, while archived records remain searchable through the history browser.
See Conversation History Enhancement for technical details.
Level 11: Launcher
Goal: Use the built-in Launcher for quick access to apps, files, clipboard, bookmarks, and snippets.
The Launcher is a keyboard-driven search panel built into 闻字's scripting system. It works like Alfred or Raycast — press a hotkey, type to search, and press Enter to act.
Enable the Launcher
- Enable scripting: Settings... → General tab → Scripting toggle
- Edit
~/.config/WenZi/config.jsonand set:
{
"scripting": {
"chooser": {
"enabled": true,
"hotkey": "cmd+space"
}
}
}
- Restart 闻字.
Basic Usage
- Press
Cmd+Space(or your configured hotkey) to open the Launcher. - Start typing to search apps — results appear instantly.
- Press
Enterto open the selected app, or⌘+Enterto reveal it in Finder. - Press
Escto close.
Prefix Search
Use a prefix followed by a space to search a specific source:
| Type this | To search |
|---|---|
f readme |
Files named "readme" |
cb hello |
Clipboard entries containing "hello" |
bm github |
Bookmarks matching "github" |
sn email |
Snippets matching "email" |
Keyboard Shortcuts
| Shortcut | Action |
|---|---|
↑ ↓ |
Navigate results |
Enter |
Open / execute |
⌘+Enter |
Reveal in Finder |
⌘1 – ⌘9 |
Quick select by position |
Esc |
Close |
Extending with Scripts
You can add custom data sources to the Launcher via scripts. See Scripting Documentation for the wz.chooser.source API.
Level 12: Fine-Tuning & Troubleshooting
Goal: Optimize your setup and solve common problems.
Settings Panel
Menubar → Settings... opens a panel with 4 tabs:
| Tab | What you can configure |
|---|---|
| General | Recording hotkeys, sound feedback, visual indicator, preview toggle, web/native preview, restart/cancel key selection, scripting toggle, custom config directory |
| STT | Local ASR model selection, remote ASR provider management |
| LLM | LLM provider and model selection, provider management |
| AI | Enhancement mode (displayed in defined order, not alphabetical), thinking mode, vocabulary, conversation history, auto build |
The Settings panel remembers the last active tab across sessions. At the bottom, toolbar buttons provide quick access to Show Config, Edit Config, and Reload Config.
Custom Config Directory
In the General tab, you can set a custom config directory to store 闻字 configuration files in a location of your choice (e.g., a synced folder). After changing the directory, 闻字 will prompt you to restart for the change to take effect.
Scripting Toggle
The General tab includes a Scripting toggle to enable or disable the scripting/plugin system. When enabled, 闻字 loads and executes Python scripts from the configured script directory. See the Scripting Documentation for details on writing plugins.
Hotkey Configuration
闻字 supports flexible hotkey configuration. The recording hotkey is configured in the Settings panel (General tab), while the clipboard enhance hotkey is set in the config file.
Hotkey Format
Hotkeys use the format modifier+modifier+key, where:
- Modifiers:
cmd(orcommand),ctrl,alt(oroption),shift - Regular keys:
a–z,0–9 - Special keys:
fn,f1–f12,esc,space - Right-side modifiers:
cmd_r,ctrl_r,alt_r,shift_r
Examples
| Hotkey | Config value | Description |
|---|---|---|
| Fn key (hold to record) | "fn" |
Default recording hotkey — single special key |
| F5 key | "f5" |
Use a function key to record |
| Ctrl+Cmd+V | "ctrl+cmd+v" |
Default clipboard enhance hotkey |
| Shift+Cmd+Space | "shift+cmd+space" |
Alternative with Space key |
| Alt+D | "alt+d" |
Option+D combination |
| Ctrl+Shift+R | "ctrl+shift+r" |
Triple modifier example |
| Ctrl+Cmd+1 | "ctrl+cmd+1" |
Number key combination |
Config File Examples
{
"hotkeys": {
"fn": true,
"f5": true
},
"clipboard_enhance": {
"hotkey": "shift+cmd+space"
}
}
Multiple recording hotkeys can be enabled simultaneously by adding entries to the hotkeys map with true values. Set to false to disable a hotkey without removing it.
Configuration File
Default location: ~/.config/WenZi/config.json
The config directory can be changed to a custom path via Settings → General → Config Directory (stored in macOS preferences, survives config file changes).
You only need to include fields you want to change — everything else uses defaults. After editing, click Reload Config in the Settings toolbar to apply without restarting.
See Configuration Reference for all options.
Logging
Logs are saved to ~/Library/Logs/WenZi/wenzi.log (5 MB rotation, 3 backups).
View logs (recommended): Menubar → View Logs... opens the built-in log viewer — the easiest way to check logs, monitor model download/loading progress, and diagnose issues in real time.
Log files are also available on disk at the path above if you prefer an external editor.
Usage Statistics
Menubar → Usage Stats opens an interactive statistics dashboard with:
- Summary cards — total transcriptions (with today's count), total tokens consumed (with cached input token breakdown), accept rate, and total recording time
- Interactive charts (powered by Chart.js) with selectable time ranges (7/14/30 days):
- Daily Transcriptions — stacked bar chart showing Direct vs Preview mode usage per day
- User Actions — stacked bar chart of Accept / Modified / Cancel actions per day
- Token Usage — stacked bar chart of Prompt / Completion / Cached tokens per day
- Enhance Modes — stacked bar chart showing usage of each enhancement mode per day
Common Issues
Text doesn't type into the app
- Check Accessibility permission in System Settings
- Try switching output method in config:
"output": {"method": "clipboard"}
Model download takes too long
- The menubar shows
DL X%during download — this is normal when switching to a model for the first time - FunASR: ~500 MB, MLX-Whisper large-v3-turbo: ~1.6 GB
- Check the log viewer for detailed progress
- If partially downloaded, delete the cache directory (
~/.cache/modelscope/for FunASR,~/.cache/huggingface/for MLX-Whisper) and restart
LLM enhancement times out
- Increase timeout: edit
config.json→ai_enhance.timeout(default: 30s) - Check if your LLM provider is reachable
- For Ollama, ensure it's running:
ollama serve
Preview panel doesn't appear
- Make sure Preview is enabled in Settings → General
- Try clicking the menubar icon to bring the app to focus
Notifications don't work during development
- Expected when running via
uv runwithout app bundling - Notifications work normally in the packaged
.appversion
Keyboard Shortcuts Summary
| Shortcut | Context | Action |
|---|---|---|
fn (hold/release) |
Global | Record / stop and transcribe |
fn + Cmd |
During recording | Restart recording (discard current audio, start new) |
fn + Space |
During recording | Cancel recording (discard audio, return to idle) |
fn + Z |
During recording | Cancel recording and show last preview history |
Ctrl+Cmd+V |
Global | Clipboard enhancement |
Cmd+Space |
Global | Open/close Launcher (if enabled) |
Enter |
Preview panel | Confirm and type text |
⌘+Enter |
Preview panel | Copy to clipboard |
Esc |
Preview panel / Streaming overlay | Cancel |
⌘1 – ⌘9 |
Preview panel | Switch enhancement mode |
⌘A/C/V/X |
Preview panel | Standard edit shortcuts |
⌘Z / ⌘⇧Z |
Preview panel | Undo / Redo |
Note: The restart key (
Cmd) and cancel key (Space) are configurable in Settings → General or via config (feedback.restart_keyandfeedback.cancel_key). Available choices:cmd,ctrl,alt,shift,space,esc.
What's Next?
You now know everything 闻字 offers. Here are some ideas to get the most out of it:
- Create modes for your workflow — meeting notes, code review comments, Slack messages
- Build chain modes — proofread → translate, or summarize → format
- Accumulate vocabulary — the more you correct, the smarter it gets
- Try different models — compare Groq's speed vs local Ollama's privacy vs OpenAI's accuracy
- Write scripts — extend 闻字 with Python scripts for custom hotkey actions (see Scripting Documentation)
- Browse Enhancement Mode Examples for inspiration
For technical details on any feature, see the documentation index.