A tiny collection of high-performance parsers and dumpers
π Written in Nim language
nimble install openparser
OpenParser is a collection of parsers and dumpers (serializers) for various data formats, written in Nim language. It provides a simple and efficient way to parse and dump data in different formats, such as JSON, TOML, YAML, BSON, CSV, FBE and more
- Parse JSON, CSV, YAML, TOML documents and more
- BSON encoding and decoding from
JsonNodeobjects - DotEnv parser for
.envfiles - Fast Binary Encoding and Decoding
- i18n GNU Gettext PO and MO file parsing and dumping
- Zero-copy JSON parsing via Memfiles for high performance and low memory usage
- Direct-to-object parsing for JSON, YAML and TOML
- CSV zero-copy parsing for large files
- RSS & Atom feed reader and writer
- Regex Engine with SIMD acceleration
- Context-aware error reporting while deserializing data
- Custom Hooks API for parsing and dumping
- Scientific notation support
- Dot notation access for nested data structures
Initially I wanted to create a simple JSON parser with fine-grained control over the parsing process (jsonl, custom hooks, error reporting, zero-copy tokenization), then I thought it would be fun to add a YAML parser that parses YAML documents in the same way as JSON. Once I started talking with the chatbot I ended up creating a collection of parsers and dumpers for various data formats.
Note
Importing openparser directly will produce a compile-time error, you need to import the specific module for the data format you want to use, e.g. openparser/json for JSON parsing and dumping or openparser/csv for CSV parsing.
OpenParser provide a simple and efficient module for parsing JSON data using the zero-copy parsing approach, which allows you to parse JSOn data without copying it into memory, making it faster and more memory-efficient.
Note
OpenParser's JSON parser is exporting the std/json module by default.
Here a simple example taking a stringified JSON and parsing it into a JsonNode tree structure:
import openparser/json
let data = """{"name":"Albush","age":40,"address":{"street":"456 Elm St","city":"Othertown","zip":67890},"friends":[]}"""
let jsonNode: JsonNode = fromJson(data)
echo jsonNode["name"].getStr # Albush
echo jsonNode["age"].getInt # 40toJson function allows you to serialize Nim data structures into JSON strings:
import openparser/json
var data = %*{
"name": "Alice",
"age": 30,
"isMember": true,
"address": {
"street": "123 Main St",
"city": "Anytown",
"zip": 12345
},
"friends": ["Bob"]
}
echo toJson(data) # {"name":"Alice"...}A todo for the future is to add support for pretty printing JSON while serializing, which would allow you to generate more human-readable JSON output with indentation and line breaks.
Here an example of how to use a custom parseHook to parse JSON data into Nim types that are not natively supported by the default parser:
import std/times
import openparser/json
import semver
proc parseHook*(parser: var JsonParser, v: var Semver) =
v = parseVersion(parser.curr.value)
parser.walk() # move the parser forward after parsing the value
proc parseHook*(parser: var JsonParser, v: var Time) =
v = parseTime(str, "yyyy-MM-dd'T'hh:mm:ss'.'ffffffz", local())
parser.walk() # move the parser forward after parsing the valueTo determine the field name being parsed in the parseHook, you can use the currentField property
available in the JsonParser object. This is a Option[string] that holds the name of the current field being parsed, if available:
if parser.currentField.isSome:
let fieldName = parser.currentField.get()
echo "Parsing field: ", fieldName
- Check the unit tests for JSON parsing and dumping with custom hooks.
- JSON API Reference
OpenParser can parse large CSV files efficiently without loading the entire file into memory, making it ideal for processing big datasets.
For example, here will use a ~680MB CSV dataset from Kaggle - TripAdvisor European restaurants that contains around 1 million rows and 42 columns.
import openparser/csv
var i = 0
let t = cpuTime()
parseFile("tripadvisor_european_restaurants.csv",
proc(fields: openArray[CsvFieldSlice], row: int): bool =
inc i
for field in fields:
discard # do something with the fields, e.g. print them
true
)
let elapsed = cpuTime() - t
echo "Parsed ", i, " rows in ", elapsed, " seconds"
# ~0.783363 seconds on my machine
# memory usage should be minimal due to zero-copy parsing with memfiles- Check the unit tests for CSV parsing.
- CSV API Reference
Parse YAML documents into a YamlNode tree structure or directly into Nim data structures using custom hooks, similar to JSON parsing and dumping.
import openparser/yaml
let yamlData = """
name: Alice
age: 30
isMember: true
address:
street: 123 Main St
city: Anytown
zip: 12345
friends:
- Bob
- Charlie
"""
let yamlNode: YamlNode = fromYaml(yamlData)
let yamlNode2: Person = fromYaml(yamlData, Person) # using custom hooks to parse directly into Nim data structures
- Check the unit tests
- YAML API Reference
Another work-in-progress parser and dumper module, this one provides support for working with TOML documents. It parses the TOML input into a TomlNode tree structure or directly into Nim data structures using custom hooks.
OpenParser includes a regex engine that provides support for regular expresion matching and searching, with SIMD acceleration for improved performance.
import openparser/regex
echo regex.match("hello world", "hello") # trueYou can combine OpenParser's JSON parsing capabilities with BSON encoding and decoding to efficiently convert between JSON and BSON formats
import openparser/[json, bson]
# Convert JSON to BSON
let jsonData = """{"name":"Alice","age":30,"isMember":true}"""
let bsonDoc: seq[byte] = fromJson(jsonData).toBson()
# To convert BSON back to JSON
let jsonNode: JsonNode = fromBson(bsonDoc)
echo jsonNode["name"].getStr # Alice- Check the unit tests
- BSON API Reference
Most of the included parsers provide context-aware error reporting, including a snippet of the data around the error location, making it easier to identify and fix issues in the JSON input.
{"name":"Alice","age":"isMember":true}
^
Error (1:33) Unexpected token `:`
- JSON depth/size limit to prevent DoS attacks
- JSON schema validation support
- JSON skippable fields
- JSON custom field mapping
Note
Some implementations are made with the chatbot (dotenv, fbe, gettext) and may be buggy or incomplete, contributions are welcome to improve them!
- π Found a bug? Create a new Issue
- π Wanna help? Fork it!
- π Get β¬20 in cloud credits from Hetzner
MIT license. Made by Humans from OpenPeeps.
Copyright OpenPeeps & Contributors β All rights reserved.