User Guide

Progressive guide from first launch to advanced usage.

Table of Contents

A progressive guide from first launch to advanced usage. Follow the levels in order — each builds on the previous one.

Level 1: Getting Started

Goal: Install 闻字 and transcribe your first sentence.

Install

Option A — Download Release (easiest):

  1. Download WenZi.app from the Releases page.
  2. Drag it to /Applications.
  3. Double-click to launch.

First launch: macOS blocks unsigned apps. Go to System Settings → Privacy & Security, find the 闻字 blocked message, and click Open Anyway.

Option B — Build from Source:

git clone https://github.com/Airead/WenZi
cd WenZi
uv sync
./scripts/build.sh        # builds WenZi.app in dist/

Option C — Run from Source (for developers):

git clone https://github.com/Airead/WenZi
cd WenZi
uv sync
uv run python -m wenzi

Grant Permissions

On first launch, macOS will ask for:

Permission Why
Microphone Record your voice
Accessibility Type text into other apps
Speech Recognition Only needed for Apple Speech backend

Grant all requested permissions in System Settings → Privacy & Security.

First Launch: Ready Immediately

The default ASR backend is Apple On-Device Speech — it uses the built-in macOS speech recognizer, so no model download is needed. 闻字 is ready to transcribe right after granting permissions.

Note: If you later switch to FunASR or MLX-Whisper in Settings, 闻字 will need to download a model (~75 MB to ~1.6 GB depending on the model). During download:

Tip: If a download fails or is interrupted, delete the cache directory (~/.cache/modelscope/ for FunASR, ~/.cache/huggingface/ for MLX-Whisper) and restart 闻字 to retry.

Your First Transcription

  1. Look for the microphone icon (🎙) in the menubar — that means 闻字 is ready.
  2. Open any text input (Notes, browser, editor, terminal…).
  3. Hold the fn key and speak.
  4. Release fn — the transcribed text appears.

That's it! You've completed the basic workflow.

Understanding the Menubar Icon

The menubar icon changes to reflect the current status:

Icon Status Meaning
🎙 (mic.fill) Ready Idle, ready to record
〰 (waveform) Recording... Capturing audio
💬 (text.bubble) Transcribing... Processing speech to text
✨ (sparkles) Enhancing... AI enhancement in progress
👁 (eye) Preview... Preview panel is open
⬇ (arrow.down.circle) + DL X% Downloading... Model download in progress
⚙ (cpu) Loading... Loading model into memory
⚠ (triangle) Error Something went wrong

Level 2: Daily Use Basics

Goal: Understand the two output modes and basic menubar controls.

Preview Mode vs Direct Mode

闻字 has two ways to deliver results:

Mode Behavior When to use
Preview (default) Shows a floating panel — review and edit before confirming When accuracy matters, or you want to check before typing
Direct Types text immediately into the active app When speed matters and you trust the transcription

Toggle via: menubar → Settings... → General tab → Preview checkbox.

Preview Panel Basics

When Preview is on, after recording you'll see a floating panel:

Click the microphone icon in the menubar to see the menu:

🎙
├── Ready                    (status indicator)
├── ─────────────────────
├── Enhance Clipboard        AI-enhance selected text (Ctrl+Cmd+V)
├── Browse History...        Search and browse past transcriptions
├── Settings...              Open the settings panel (4 tabs)
├── ─────────────────────
├── View Logs...             Open log viewer
├── Usage Stats              View usage statistics
├── About 闻字          Version info
└── Quit

All model selection, AI enhancement configuration, and hotkey management are done through the Settings panel — not the menubar menu directly.

Recording Feedback

While holding fn, a floating indicator with audio level bars shows you're recording. A sound plays on start and stop (configurable in Settings → General).

When the ASR backend supports streaming (e.g., Apple Speech), a live transcription overlay appears below the recording indicator, showing partial transcription text in real time as you speak. This gives you immediate visual feedback without waiting for the recording to end.

Recording Controls

While holding the recording hotkey, you can press additional keys to control the session:

Key (while holding fn) Action
Cmd (default) Restart recording — discards current audio and starts a new recording
Space (default) Cancel recording — discards audio and returns to idle
Z Show last preview — cancels recording and opens the last preview result

The restart and cancel keys can be customized in Settings → General (or via config: feedback.restart_key and feedback.cancel_key).


Level 3: Choosing the Right ASR Backend

Goal: Pick the best speech engine for your needs.

Backend Comparison

Backend Language Speed Accuracy Download Size
Apple Speech (default) Multiple Fast Good None (built-in)
FunASR Chinese Fast High (Chinese) ~500 MB
MLX-Whisper 99 languages Medium High 75 MB – 1.6 GB
Whisper API Multiple Depends on network High None (cloud)

How to Switch

Open Settings...STT tab. You'll see:

Click a radio button to switch. The model will start loading (or downloading if not yet cached).

First download reminder: When switching to a new MLX-Whisper model for the first time, the model needs to download. Watch the menubar icon for DL X% progress. Model sizes range from ~75 MB (tiny) to ~1.6 GB (large-v3-turbo).

Recommendations


Level 4: AI Enhancement

Goal: Use an LLM to proofread, translate, or reformat transcribed text.

AI enhancement is optional — by default it's off. When enabled, your transcribed text is sent to an LLM for post-processing before output.

Step 1: Set Up an LLM Provider

You need an LLM backend. Two easy options:

Option A — Local with Ollama (free, private):

  1. Install Ollama and run ollama pull qwen2.5:7b
  2. That's it — 闻字.s default config points to Ollama

Option B — Cloud API (e.g., DeepSeek, OpenAI):

  1. Open Settings...LLM tab → Add Provider...
  2. Fill in provider details:
name: deepseek
base_url: https://api.deepseek.com/v1
api_key: sk-your-key
models:
  deepseek-chat
  1. Click VerifySave

Step 2: Select an Enhancement Mode

Open Settings...AI tab. Select a mode:

Mode What it does
Off No enhancement (raw transcription)
纠错润色 (Proofread) Fix typos, grammar, punctuation
翻译为英文 (Translate EN) Translate to English
命令行大神 (Commandline) Convert speech to shell commands

Step 3: Try It

  1. Make sure an LLM provider is configured and a mode is selected.
  2. Hold fn, say something, release.
  3. The result now goes through the LLM before appearing.

Tip: Start with "纠错润色" (Proofread) — it's the most universally useful mode.


Level 5: Preview Power Features

Goal: Master the preview panel's editing and switching capabilities.

With Preview mode on and AI enhancement active, the preview panel becomes a powerful editor.

Quick Mode Switching

Press ⌘1 through ⌘9 to instantly switch enhancement modes and re-process the same audio:

Result Caching

When you switch modes in the preview panel, 闻字 caches completed results. Switching back to a previously used mode shows the cached result instantly (marked [cached]) — no API call needed.

The cache is cleared when new audio is recorded.

Preview History

闻字 keeps an in-memory history of your last 10 preview results (cleared on app restart). This lets you go back to a previous transcription without re-recording.

Web Preview Panel

The preview panel uses a modern WKWebView-based (HTML/CSS/JS) interface by default, providing a polished look with dark mode support. You can switch between the web-based and native AppKit preview in Settings → General → Web Preview toggle.

Other Preview Features

Feature How
Edit text Click the text area and type
Copy to clipboard ⌘+Enter — copies instead of typing into the active app
Toggle punctuation Check/uncheck the Punc checkbox to re-transcribe with/without punctuation
Switch STT model Use the STT dropdown in the panel
Switch LLM model Use the LLM dropdown in the panel
Play audio Click the play button to hear the recording
Save audio Click save to export the recording as a file
Google Translate Click the translate button to open Google Translate with current text

Level 6: Direct Mode & Streaming

Goal: Use 闻字 for fast, hands-free input with real-time AI feedback.

Enable Direct Mode

Turn off Preview: Settings... → General tab → uncheck Preview.

Now when you release the hotkey, text is typed directly into the active app — no panel, no confirmation needed.

Real-Time Streaming STT

When using an ASR backend that supports streaming (currently Apple Speech), 闻字 shows a live transcription overlay during recording. Partial text appears in real-time as you speak, giving you instant feedback before you even release the hotkey.

This works in both Preview and Direct modes. In Direct mode, it is especially useful because you can see the transcription forming and decide whether to keep or cancel it.

AI Streaming Overlay

In direct mode, after recording ends, a streaming overlay appears showing the processing pipeline:

  1. Transcription phase — the overlay first shows the ASR result (or streams partial text if the backend supports it)
  2. Enhancement phase (if AI enhancement is active) — the LLM processes the text in real-time, with tokens appearing as they are generated

Controls during the overlay:

When to Use Direct Mode


Level 7: Clipboard Enhancement

Goal: AI-enhance any text in any app, not just speech transcriptions.

How It Works

  1. Select text in any application.
  2. Press Ctrl+Cmd+V (default hotkey).
  3. 闻字 copies the selection, sends it to the LLM with the current enhancement mode, and outputs the result.

You can also trigger it from the menubar: click Enhance Clipboard.

Use Cases

Output Behavior

Customize the Hotkey

Edit ~/.config/WenZi/config.json:

{
  "clipboard_enhance": {
    "hotkey": "ctrl+cmd+v"
  }
}

The hotkey format is modifier+modifier+key. See Level 12 for format details and examples.


Level 8: Custom Enhancement Modes

Goal: Create your own AI modes and chain pipelines.

Create a Custom Mode

Via Settings (easy):

  1. Open Settings...AI tab → Add Mode...
  2. Edit the template, click Save, enter a mode ID.

Via file (flexible):

Create a .md file in ~/.config/WenZi/enhance_modes/:

---
label: Formal Email
order: 60
---
You are a professional email writing assistant.
Rewrite the user's input as a formal, polished email body.
Use appropriate greetings and closings if context suggests an email.
Maintain the original intent and key information.
Output only the email text without any explanation.

The filename (without .md) becomes the mode ID. Restart to load.

Create a Chain Mode

Chain modes run multiple steps sequentially:

---
label: 润色+翻译EN
order: 25
steps: proofread, translate_en
---
First proofreads the text, then translates to English.
(This body is documentation only — each step uses its own prompt.)

Tips for Good Prompts

See Enhancement Mode Examples for ready-to-use templates covering email, meeting notes, translation, developer tools, and more.


Level 9: Multi-Provider Setup

Goal: Configure multiple ASR and LLM providers and switch between them.

Why Multiple Providers?

Add Providers via Settings

LLM providers: Settings → LLM tab → Add Provider...

ASR providers: Settings → STT tab → Add Provider...

Both use the same dialog format:

name: provider-name
base_url: https://api.example.com/v1
api_key: your-key
models:
  model-1
  model-2

Switch at Runtime

In the Settings panel, all configured models appear as radio buttons. Click to switch — no restart needed.

In Preview Panel

You can also switch LLM and STT models directly from the preview panel's dropdowns, making it easy to compare results from different models on the same audio.

See Provider & Model Setup Guide for detailed examples covering Ollama, OpenAI, DeepSeek, Groq, OpenRouter, Qwen, and more.


Level 10: Vocabulary & Conversation History

Goal: Teach 闻字 your personal terms and maintain topic context across turns.

Vocabulary Retrieval

Problem: ASR often misrecognizes proper nouns, technical terms, and names (e.g., "萍萍" → "平平").

Solution: 闻字 builds a personal vocabulary from your correction history and uses it to improve future results.

How to Build Vocabulary

  1. Use Preview mode with AI enhancement — edit the result when the AI gets a term wrong.
  2. Each edit is logged to ~/.config/WenZi/conversation_history.jsonl with a user_corrected flag.
  3. Auto build (default): After every 10 corrections, vocabulary is rebuilt automatically in the background.
  4. Manual build: Settings → AI tab → Build Vocabulary...

Enable Vocabulary

Settings → AI tab → toggle Vocabulary (N). The number shows how many entries are indexed.

When enabled, relevant vocabulary entries are retrieved via embedding similarity and injected into the LLM prompt, helping it correct domain-specific terms.

Conversation History

Problem: Each transcription is independent — the LLM doesn't know what you just said.

Solution: 闻字 injects recent confirmed outputs into the AI prompt, so the LLM understands the current topic.

Enable

Settings → AI tab → toggle Conversation History.

How It Works

Browse History

Menubar → Browse History... opens a full-featured history browser with:

Auto-Rotation and Archiving

When conversation history exceeds 20,000 records, 闻字 automatically archives older records into monthly files under ~/.config/WenZi/conversation_history_archives/YYYY-MM.jsonl. The main history file keeps the most recent 20,000 records for fast access, while archived records remain searchable through the history browser.

See Conversation History Enhancement for technical details.


Level 11: Launcher

Goal: Use the built-in Launcher for quick access to apps, files, clipboard, bookmarks, and snippets.

The Launcher is a keyboard-driven search panel built into 闻字's scripting system. It works like Alfred or Raycast — press a hotkey, type to search, and press Enter to act.

Enable the Launcher

  1. Enable scripting: Settings... → General tab → Scripting toggle
  2. Edit ~/.config/WenZi/config.json and set:
{
  "scripting": {
    "chooser": {
      "enabled": true,
      "hotkey": "cmd+space"
    }
  }
}
  1. Restart 闻字.

Basic Usage

  1. Press Cmd+Space (or your configured hotkey) to open the Launcher.
  2. Start typing to search apps — results appear instantly.
  3. Press Enter to open the selected app, or ⌘+Enter to reveal it in Finder.
  4. Press Esc to close.

Use a prefix followed by a space to search a specific source:

Type this To search
f readme Files named "readme"
cb hello Clipboard entries containing "hello"
bm github Bookmarks matching "github"
sn email Snippets matching "email"

Keyboard Shortcuts

Shortcut Action
Navigate results
Enter Open / execute
⌘+Enter Reveal in Finder
⌘1⌘9 Quick select by position
Esc Close

Extending with Scripts

You can add custom data sources to the Launcher via scripts. See Scripting Documentation for the wz.chooser.source API.


Level 12: Fine-Tuning & Troubleshooting

Goal: Optimize your setup and solve common problems.

Settings Panel

Menubar → Settings... opens a panel with 4 tabs:

Tab What you can configure
General Recording hotkeys, sound feedback, visual indicator, preview toggle, web/native preview, restart/cancel key selection, scripting toggle, custom config directory
STT Local ASR model selection, remote ASR provider management
LLM LLM provider and model selection, provider management
AI Enhancement mode (displayed in defined order, not alphabetical), thinking mode, vocabulary, conversation history, auto build

The Settings panel remembers the last active tab across sessions. At the bottom, toolbar buttons provide quick access to Show Config, Edit Config, and Reload Config.

Custom Config Directory

In the General tab, you can set a custom config directory to store 闻字 configuration files in a location of your choice (e.g., a synced folder). After changing the directory, 闻字 will prompt you to restart for the change to take effect.

Scripting Toggle

The General tab includes a Scripting toggle to enable or disable the scripting/plugin system. When enabled, 闻字 loads and executes Python scripts from the configured script directory. See the Scripting Documentation for details on writing plugins.

Hotkey Configuration

闻字 supports flexible hotkey configuration. The recording hotkey is configured in the Settings panel (General tab), while the clipboard enhance hotkey is set in the config file.

Hotkey Format

Hotkeys use the format modifier+modifier+key, where:

Examples

Hotkey Config value Description
Fn key (hold to record) "fn" Default recording hotkey — single special key
F5 key "f5" Use a function key to record
Ctrl+Cmd+V "ctrl+cmd+v" Default clipboard enhance hotkey
Shift+Cmd+Space "shift+cmd+space" Alternative with Space key
Alt+D "alt+d" Option+D combination
Ctrl+Shift+R "ctrl+shift+r" Triple modifier example
Ctrl+Cmd+1 "ctrl+cmd+1" Number key combination

Config File Examples

{
  "hotkeys": {
    "fn": true,
    "f5": true
  },
  "clipboard_enhance": {
    "hotkey": "shift+cmd+space"
  }
}

Multiple recording hotkeys can be enabled simultaneously by adding entries to the hotkeys map with true values. Set to false to disable a hotkey without removing it.

Configuration File

Default location: ~/.config/WenZi/config.json

The config directory can be changed to a custom path via Settings → General → Config Directory (stored in macOS preferences, survives config file changes).

You only need to include fields you want to change — everything else uses defaults. After editing, click Reload Config in the Settings toolbar to apply without restarting.

See Configuration Reference for all options.

Logging

Logs are saved to ~/Library/Logs/WenZi/wenzi.log (5 MB rotation, 3 backups).

View logs (recommended): Menubar → View Logs... opens the built-in log viewer — the easiest way to check logs, monitor model download/loading progress, and diagnose issues in real time.

Log files are also available on disk at the path above if you prefer an external editor.

Usage Statistics

Menubar → Usage Stats opens an interactive statistics dashboard with:

Common Issues

Text doesn't type into the app

Model download takes too long

LLM enhancement times out

Preview panel doesn't appear

Notifications don't work during development

Keyboard Shortcuts Summary

Shortcut Context Action
fn (hold/release) Global Record / stop and transcribe
fn + Cmd During recording Restart recording (discard current audio, start new)
fn + Space During recording Cancel recording (discard audio, return to idle)
fn + Z During recording Cancel recording and show last preview history
Ctrl+Cmd+V Global Clipboard enhancement
Cmd+Space Global Open/close Launcher (if enabled)
Enter Preview panel Confirm and type text
⌘+Enter Preview panel Copy to clipboard
Esc Preview panel / Streaming overlay Cancel
⌘1⌘9 Preview panel Switch enhancement mode
⌘A/C/V/X Preview panel Standard edit shortcuts
⌘Z / ⌘⇧Z Preview panel Undo / Redo

Note: The restart key (Cmd) and cancel key (Space) are configurable in Settings → General or via config (feedback.restart_key and feedback.cancel_key). Available choices: cmd, ctrl, alt, shift, space, esc.


What's Next?

You now know everything 闻字 offers. Here are some ideas to get the most out of it:

For technical details on any feature, see the documentation index.

Why Error Correction Is So Powerful →

Five layers of correction that get smarter every time you use it.