You open a CSV that was perfectly fine yesterday. Now names look like José, prices show as €19.99, and curly quotes have turned into “thisâ€. Nothing changed about your data. The file is broken.
It isn't broken. The data is still there, intact — but the file was saved in one character encoding and opened expecting a different one. Every character became the wrong character.
This guide explains what's happening, how to detect it, and how to fix it across every tool you might actually use.
What Character Encoding Errors Look Like
The classic symptoms:
| What you see | What it means |
|---|---|
José instead of José | File is Windows-1252 or Latin-1, opened as UTF-8 |
“hello†instead of "hello" | Smart quotes encoded in UTF-8, decoded as Windows-1252 |
€ instead of € | Euro sign mis-decoded |
? or â–ˆ boxes everywhere | File is Shift-JIS or GB2312, opened with a Latin encoding |
Accent-free text (cafe instead of café) | High-byte characters were stripped during export |
The corrupted-text phenomenon has a name: mojibake (文字化け), Japanese for "character transformation." It's always a mismatch between the encoding the software wrote with and the encoding the software reads with.
The underlying data is fine. You don't need to re-export from the source — you just need to tell your tool what encoding to use when reading.
Why This Happens: A 90-Second Encoding Primer
Every character in a text file is stored as a number. The encoding is the lookup table that maps numbers to characters.
- ASCII (1963): 128 characters. Works for English only.
- Windows-1252 (1985): 256 characters. Extends ASCII with European accented letters. Widely used in Windows software.
- ISO-8859-1 / Latin-1 (1987): Similar to Windows-1252, slightly different for characters 128–159.
- UTF-8 (1993): Handles every character in every human language. Backwards-compatible with ASCII. The modern standard.
The problem: é (e with an accent) is stored as 0xE9 in Windows-1252. In UTF-8, 0xE9 is an incomplete two-byte sequence — so UTF-8 decoders fill in a replacement character instead, producing é.
No data was lost. The bytes are still 0xE9. You just need to tell your software "read this file as Windows-1252," not UTF-8.

Step 1: Identify the Actual Encoding
Before you can fix the problem, you need to know what encoding the file is actually using.
On Mac: Use the file command
file -I yourfile.csv
Typical output:
yourfile.csv: text/plain; charset=utf-8
yourfile.csv: text/plain; charset=iso-8859-1
yourfile.csv: text/plain; charset=unknown-8bit
unknown-8bit means the file has bytes above 127 that the tool can't definitively classify. Try Windows-1252 first — it's by far the most common culprit for European exports and legacy CRM dumps.
Open the raw bytes in a text editor
On Mac: open the file in TextEdit, BBEdit, or VS Code. In VS Code, the detected encoding appears in the bottom status bar (e.g., UTF-8). Click it to try reinterpreting the file with a different encoding.
Check where the file came from
The source often tells you which encoding to expect:
| Source | Likely encoding |
|---|---|
| Windows Excel (pre-2019) | Windows-1252 |
| Windows Excel 2019+ (Save as CSV UTF-8) | UTF-8 with BOM |
| Mac Excel | UTF-8 |
| Google Sheets export | UTF-8 without BOM |
| SAP / legacy ERP | ISO-8859-1 or Windows-1252 |
| Japanese systems | Shift-JIS |
| Chinese systems | GB2312 / GBK |
| Modern APIs and databases | UTF-8 |
How to Fix Encoding Errors in Google Sheets
Google Sheets always exports UTF-8, but it doesn't always import your file correctly — especially if the file came from Windows.
Method 1: Use the Import Wizard
Don't drag the file into Google Sheets and don't open it from Drive without specifying the encoding. Use File → Import instead:
- Open Google Sheets and go to File → Import
- Upload the CSV
- In the import settings, find Character encoding and change it from
Automaticto the encoding your file actually uses (e.g.,Windows-1252orISO-8859-1) - Confirm and import
This is the most reliable fix. Google Sheets will now decode the bytes correctly and all accented characters will appear as intended.
Method 2: Convert the File to UTF-8 First
If you're regularly receiving Windows-1252 files and want Google Sheets to just work without touching the import dialog, convert the file to UTF-8 before uploading.
The quickest way on Mac:
iconv -f windows-1252 -t utf-8 input.csv > output_utf8.csv
Replace windows-1252 with iso-8859-1, shift-jis, or whatever encoding file -I reported.
Now upload output_utf8.csv to Google Sheets — it will import cleanly with no encoding dialog needed.
How to Fix Encoding Errors in Excel (Mac)
Excel on Mac handles encoding better than its Windows counterpart, but still requires explicit guidance for non-UTF-8 files.
Use the Text Import Wizard
- Open a blank workbook
- Go to Data → Get Data → From Text/CSV
- Select your file
- In the preview dialog, find the File Origin dropdown and select the correct encoding (e.g.,
1252: Western European (Windows)) - Click Load
The "File Origin" dropdown is Excel's way of letting you specify encoding. The numbering corresponds to Windows code pages — 1252 is Windows-1252, 65001 is UTF-8.
The UTF-8 BOM Problem (Excel-specific)
Windows Excel has a quirk: when you save a file as "CSV UTF-8 (Comma delimited) (*.csv)", it prepends a BOM (Byte Order Mark) — three invisible bytes at the start of the file (0xEF 0xBB 0xBF).
A BOM isn't harmful when the file stays in Excel, but it confuses many other tools: Python's csv module reads the first column as Column1 instead of Column1. Google Sheets handles it fine. Most Unix tools don't.
To strip a BOM from a file on Mac:
# Check for BOM
hexdump -C yourfile.csv | head -1
# Strip BOM (outputs to new file)
tail -c +4 yourfile.csv > yourfile-nobom.csv
Or with sed:
sed -i '' 's/^\xEF\xBB\xBF//' yourfile.csv
How to Fix Encoding Errors With Python
Python gives you the most control and is the right tool for batch processing or when you need to handle multiple files.
Detect encoding with chardet
pip install chardet
import chardet
with open('yourfile.csv', 'rb') as f:
raw = f.read(10000) # read first 10KB — enough to detect
result = chardet.detect(raw)
print(result)
# {'encoding': 'ISO-8859-1', 'confidence': 0.73, 'language': ''}
The confidence value matters: below 0.7, try a few encodings manually.
Re-read and convert with pandas
import pandas as pd
# Read with the detected encoding
df = pd.read_csv('yourfile.csv', encoding='windows-1252')
# Save as clean UTF-8 (no BOM)
df.to_csv('output_utf8.csv', index=False, encoding='utf-8')
# Save as UTF-8 with BOM (if destination is Windows Excel)
df.to_csv('output_utf8_bom.csv', index=False, encoding='utf-8-sig')
utf-8-sig is pandas' name for UTF-8 with BOM. Use it only when the output will be opened in Windows Excel.
Batch convert a directory
import pandas as pd
from pathlib import Path
source_encoding = 'windows-1252'
input_dir = Path('raw_exports/')
output_dir = Path('converted/')
output_dir.mkdir(exist_ok=True)
for csv_file in input_dir.glob('*.csv'):
df = pd.read_csv(csv_file, encoding=source_encoding)
df.to_csv(output_dir / csv_file.name, index=False, encoding='utf-8')
print(f'Converted: {csv_file.name}')
This is the right approach if you receive weekly Windows exports from an ERP or CRM and need them clean before loading into Google Sheets or a database.

How to Fix Encoding Errors on the Mac Command Line
For quick one-off conversions, iconv and file are fast and reliable.
Convert encoding with iconv
# Windows-1252 → UTF-8
iconv -f cp1252 -t utf-8 input.csv > output.csv
# ISO-8859-1 → UTF-8
iconv -f iso-8859-1 -t utf-8 input.csv > output.csv
# Shift-JIS → UTF-8 (Japanese files)
iconv -f shift_jis -t utf-8 input.csv > output.csv
cp1252 is the iconv code for Windows-1252. Run iconv -l to see all supported encodings.
If iconv encounters a byte it can't convert, it exits with an error and produces an empty output file. Add -c to skip unconvertible bytes:
iconv -c -f cp1252 -t utf-8 input.csv > output.csv
This silently drops characters that can't be converted. Usually acceptable for a handful of stray bytes in a large export; not acceptable if accuracy matters on every row.
Common Scenarios and Quick Fixes
Scenario 1: "My CSV from Excel has garbage characters"
Windows Excel saves CSV as Windows-1252 by default when you choose "Comma Separated Values (.csv)". The modern workaround is to save as "CSV UTF-8 (Comma delimited) (*.csv)" instead — that option was added in Excel 2019.
If you can't change the export: iconv -f cp1252 -t utf-8 input.csv > output.csv before importing.
Scenario 2: "Google Sheets exported a CSV and now it doesn't work in another tool"
Google Sheets exports UTF-8 without BOM. Most tools handle this fine. If you're importing into Windows Excel and seeing issues, add a BOM: python3 -c "import sys; sys.stdout.buffer.write(b'\xef\xbb\xbf'); sys.stdout.buffer.write(open('file.csv','rb').read())" > file_bom.csv
Or just reopen in Google Sheets and download again via File → Download → CSV — it'll still be UTF-8 without BOM, so the tool receiving it needs to handle that correctly.
Scenario 3: "Only some rows have garbled characters"
This means different rows were encoded differently — common when data was merged from multiple sources. You'll need Python with chardet to detect encoding row-by-row, or open the raw file in a hex editor to see exactly which bytes are misbehaving.
Scenario 4: "Accented characters are missing entirely — just replaced with ?"
This happens when a converter tried to transcode and couldn't map the character, then used ? as a fallback. The original bytes are gone. You need to re-export from the source — there's no way to recover replaced data.
Scenario 5: "The file looks fine on my machine but garbled on someone else's"
Your machine is set to a regional locale that matches the file's encoding. Their machine has a different default. The fix is to explicitly export as UTF-8 so the encoding is specified rather than assumed.
How to Prevent Encoding Problems
The root cause of almost every encoding error is an assumption: the tool assumes UTF-8, the file is Windows-1252. Remove the assumption.
When exporting CSV:
- Excel on Windows: always choose "CSV UTF-8 (Comma delimited) (*.csv)", not the plain CSV option
- Excel on Mac: defaults to UTF-8, no action needed
- Google Sheets: exports UTF-8 by default — fine
- Legacy systems: check your export settings; add a BOM if recipients use Windows tools
When importing CSV:
- Always specify the encoding explicitly in the import dialog
- Don't rely on auto-detection for files from mixed-encoding sources
- Validate after import: spot-check rows with accented characters or special symbols
When sharing CSV files:
- If you're not sure, convert to UTF-8 before sending:
iconv -f cp1252 -t utf-8 input.csv > input_utf8.csv - Communicate the encoding in the filename if the recipient needs to know:
export_utf8.csv
Frequently Asked Questions
Q: What's the difference between UTF-8, UTF-8 with BOM, and UTF-16? A: UTF-8 is the standard web encoding — no BOM. UTF-8 with BOM adds three invisible bytes at the start so Windows tools can identify the encoding automatically. UTF-16 uses 2 bytes per character (or 4 for rare characters) and is used by some Windows applications for Unicode support. For CSV files, UTF-8 without BOM is the right choice for anything leaving Windows; UTF-8 with BOM if the output goes directly into Windows Excel.
Q: Why does Excel on Windows save CSVs in Windows-1252 when UTF-8 exists? A: Legacy compatibility. Windows-1252 has been Excel's default encoding for decades. Microsoft added "CSV UTF-8" as a separate format option in Excel 2019 rather than changing the default to avoid breaking existing workflows. Old habits persist.
Q: Is iconv safe to use on large files?
A: Yes — iconv streams the input and produces output incrementally. It doesn't load the whole file into memory. You can run it on multi-GB CSV files without issue.
Q: Can I fix encoding in Google Sheets without re-importing? A: Not really. Once the file is imported with the wrong encoding, the bytes have already been misinterpreted and the decoded text is wrong. You need to re-import with the correct encoding specified. There's no in-sheet encoding setting that reinterprets existing data.
Q: My file reports UTF-8 but still has garbled text — why?
A: The file may have a BOM marker confusing the reader, or it may have been "converted" to UTF-8 without actually correcting the underlying bytes. Open the raw bytes with hexdump -C yourfile.csv | head and check whether characters above 0x7F look like properly-formed UTF-8 multi-byte sequences or like raw Latin-1 bytes.
Q: Does CSVtoSheets handle encoding automatically? A: Yes — CSVtoSheets detects encoding before passing your CSV to Google Sheets, so accented characters and special symbols arrive correctly. It's one of the common failure points in the manual double-click workflow on Mac, where Finder opens the file with the system locale's default encoding rather than inspecting the file.
