Methodology

How cleaning works

CleanSheet is deterministic. The same input produces the same output every time, every change is shown to you before it is applied, and nothing is ever guessed or invented. This page documents every rule the tool uses, including its assumptions, so you can decide whether it fits your data.

Column detection

The tool samples up to the first 100 rows of each column and tests them against known patterns. A column is assigned a type (phone, email, date, ZIP code, or number) only when at least 60% of its sampled values match that type. Anything below that threshold is treated as plain text and receives only whitespace and capitalization cleanup. A structured-looking column that fails detection is never force-converted.

Phone numbers

Email addresses

Emails are lowercased and trimmed. [email protected] becomes [email protected]. Invalid-looking emails are left unchanged rather than “fixed.”

Dates

Recognized dates are standardized to ISO format (YYYY-MM-DD), which sorts correctly in every spreadsheet tool. Recognized input formats include 12/25/2023, 2023-1-5, 12-25-2023, Jan 3 2024, and 3 Jan 2024. European dotted dates (25.12.2023) are not yet supported and are left unchanged.

Important assumption: ambiguous slash dates are read as US month-first. 4/1/24 becomes 2024-04-01 (April 1), not January 4. If your data uses day-first (UK/EU) dates, do not enable date cleaning on that column yet; a day-first option is on the roadmap. Unparseable values are left unchanged.

Names and text

ZIP codes

5-digit ZIPs are kept as 5 digits; 9-digit ZIPs become ZIP+4 format (12345-6789). Other postal formats are left unchanged.

What the tool never does

Limits

Free tier: CSV files up to 2MB or 2,500 rows. Excel support is coming soon (in Excel: File > Save As > CSV). Exports are deleted from our server after 24 hours. Credit packs for larger files are coming; join the waitlist on the homepage.

How the rules improve

When the tool meets a value it cannot clean, it records only the anonymized shape of that value (letters become “a”, digits become “9”, so 555-0123 is stored as 999-9999). Real data is never stored for this purpose. We review these shapes weekly and add new rules by hand. Questions about a specific format? Ask us.