CSV and JSON are two of the most common data interchange formats, yet converting between them is surprisingly tricky. CSV looks simple — values separated by commas — but quoted fields, escaped characters, and inconsistent formatting make robust parsing challenging. Regex provides a precise way to handle these edge cases.

The CSV Parsing Challenge

A naive approach of splitting on commas breaks immediately when fields contain commas: `"Smith, John",30,"New York"` has three fields, not five. Proper CSV parsing must handle quoted fields, escaped quotes within fields, and the difference between empty fields and missing fields.

A Robust CSV Field Pattern

The regex `"([^"]*(?:""[^"]*)*)"|([^,\n]*)` matches both quoted and unquoted CSV fields. The first alternative handles quoted fields with doubled-quote escaping. The second handles plain unquoted fields. This pattern correctly parses the vast majority of real-world CSV data.

Building the Conversion Pipeline

A CSV-to-JSON conversion pipeline has three steps: split the file into lines, parse each line into fields using regex, and map fields to JSON object properties using the header row as keys. The regex handles step two — the most error-prone part — while simple string operations handle the rest.

Handling Edge Cases

Real-world CSV files have inconsistencies: mixed line endings (\r\n vs \n), trailing commas, BOM characters at the start of files, and fields with newlines inside quotes. A production-quality converter must handle all of these. Regex patterns can be composed to address each case individually.

When Not to Use Regex

For very large CSV files (millions of rows), a dedicated streaming CSV parser is more efficient than regex. For complex nested JSON output, a scripting language with proper CSV and JSON libraries is more maintainable. Regex shines for quick conversions, data exploration, and cases where a full parser is unavailable.

Testing CSV Patterns

RegExpress for iOS lets you paste sample CSV data and test parsing patterns interactively. This is invaluable for developing patterns that handle your specific data’s quirks before writing the conversion code.