Metadata-Version: 2.1 Name: csvw Version: 3.5.1 Summary: Python library to work with CSVW described tabular data Home-page: https://github.com/cldf/csvw Author: Robert Forkel Author-email: robert_forkel@eva.mpg.de License: Apache 2.0 Project-URL: Bug Tracker, https://github.com/cldf/csvw/issues Keywords: csv,w3c,tabular-data Platform: any Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers Classifier: Intended Audience :: Science/Research Classifier: Natural Language :: English Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: 3.12 Classifier: Programming Language :: Python :: 3.13 Classifier: Programming Language :: Python :: Implementation :: CPython Classifier: Programming Language :: Python :: Implementation :: PyPy Classifier: License :: OSI Approved :: Apache Software License Requires-Python: >=3.8 Description-Content-Type: text/markdown License-File: LICENSE Requires-Dist: attrs>=18.1 Requires-Dist: isodate Requires-Dist: python-dateutil Requires-Dist: rfc3986<2 Requires-Dist: uritemplate>=3.0.0 Requires-Dist: babel Requires-Dist: requests Requires-Dist: language-tags Requires-Dist: rdflib Requires-Dist: colorama Requires-Dist: jsonschema Provides-Extra: dev Requires-Dist: flake8; extra == "dev" Requires-Dist: wheel; extra == "dev" Requires-Dist: twine; extra == "dev" Requires-Dist: build; extra == "dev" Provides-Extra: docs Requires-Dist: sphinx<7; extra == "docs" Requires-Dist: sphinx-autodoc-typehints; extra == "docs" Requires-Dist: sphinx-rtd-theme; extra == "docs" Provides-Extra: test Requires-Dist: frictionless; extra == "test" Requires-Dist: pytest>=5; extra == "test" Requires-Dist: pytest-mock; extra == "test" Requires-Dist: requests-mock; extra == "test" Requires-Dist: pytest-cov; extra == "test" # csvw [![Build Status](https://github.com/cldf/csvw/workflows/tests/badge.svg)](https://github.com/cldf/csvw/actions?query=workflow%3Atests) [![PyPI](https://img.shields.io/pypi/v/csvw.svg)](https://pypi.org/project/csvw) [![Documentation Status](https://readthedocs.org/projects/csvw/badge/?version=latest)](https://csvw.readthedocs.io/en/latest/?badge=latest) This package provides - a Python API to read and write relational, tabular data according to the [CSV on the Web](https://csvw.org/) specification and - commandline tools for reading and validating CSVW data. ## Links - GitHub: https://github.com/cldf/csvw - PyPI: https://pypi.org/project/csvw - Issue Tracker: https://github.com/cldf/csvw/issues ## Installation This package runs under Python >=3.8, use pip to install: ```bash $ pip install csvw ``` ## CLI ### `csvw2json` Converting CSVW data [to JSON](https://www.w3.org/TR/csv2json/) ```shell $ csvw2json tests/fixtures/zipped-metadata.json { "tables": [ { "url": "tests/fixtures/zipped.csv", "row": [ { "url": "tests/fixtures/zipped.csv#row=2", "rownum": 1, "describes": [ { "ID": "abc", "Value": "the value" } ] }, { "url": "tests/fixtures/zipped.csv#row=3", "rownum": 2, "describes": [ { "ID": "cde", "Value": "another one" } ] } ] } ] } ``` ### `csvwvalidate` Validating CSVW data ```shell $ csvwvalidate tests/fixtures/zipped-metadata.json OK ``` ### `csvwdescribe` Describing tabular-data files with CSVW metadata ```shell $ csvwdescribe --delimiter "|" tests/fixtures/frictionless-data.csv { "@context": "http://www.w3.org/ns/csvw", "dc:conformsTo": "data-package", "tables": [ { "dialect": { "delimiter": "|" }, "tableSchema": { "columns": [ { "datatype": "string", "name": "FK" }, { "datatype": "integer", "name": "Year" }, { "datatype": "string", "name": "Location name" }, { "datatype": "string", "name": "Value" }, { "datatype": "string", "name": "binary" }, { "datatype": "string", "name": "anyURI" }, { "datatype": "string", "name": "email" }, { "datatype": "string", "name": "boolean" }, { "datatype": { "dc:format": "application/json", "base": "json" }, "name": "array" }, { "datatype": { "dc:format": "application/json", "base": "json" }, "name": "geojson" } ] }, "url": "tests/fixtures/frictionless-data.csv" } ] } ``` ## Python API Find the Python API documentation at [csvw.readthedocs.io](https://csvw.readthedocs.io/en/latest/). A quick example for using `csvw` from Python code: ```python import json from csvw import CSVW data = CSVW('https://raw.githubusercontent.com/cldf/csvw/master/tests/fixtures/test.tsv') print(json.dumps(data.to_json(minimal=True), indent=4)) [ { "province": "Hello", "territory": "world", "precinct": "1" } ] ``` ## Known limitations - We read **all** data which is specified as UTF-8 encoded using the [`utf-8-sig` codecs](https://docs.python.org/3/library/codecs.html#module-encodings.utf_8_sig). Thus, if such data starts with `U+FEFF` this will be interpreted as [BOM](https://en.wikipedia.org/wiki/Byte_order_mark) and skipped. - Low level CSV parsing is delegated to the `csv` module in Python's standard library. Thus, if a `commentPrefix` is specified in a `Dialect` instance, this will lead to skipping rows where the first value starts with `commentPrefix`, **even if the value was quoted**. - Also, cell content containing `escapechar` may not be round-tripped as expected (when specifying `escapechar` or a `csvw.Dialect` with `quoteChar` but `doubleQuote==False`), when minimal quoting is specified. This is due to inconsistent `csv` behaviour across Python versions (see https://bugs.python.org/issue44861). ## CSVW conformance While we use the CSVW specification as guideline, this package does not (and probably never will) implement the full extent of this spec. - When CSV files with a header are read, columns are not matched in order with column descriptions in the `tableSchema`, but instead are matched based on the CSV column header and the column descriptions' `name` and `titles` atributes. This allows for more flexibility, because columns in the CSV file may be re-ordered without invalidating the metadata. A stricter matching can be forced by specifying `"header": false` and `"skipRows": 1` in the table's dialect description. However, `csvw.CSVW` works correctly for - 269 out of 270 [JSON tests](https://w3c.github.io/csvw/tests/#manifest-json), - 280 out of 282 [validation tests](https://w3c.github.io/csvw/tests/#manifest-validation), - 10 out of 18 [non-normative tests](https://w3c.github.io/csvw/tests/#manifest-nonnorm) from the [CSVW Test suites](https://w3c.github.io/csvw/tests/). ## Compatibility with [Frictionless Data Specs](https://specs.frictionlessdata.io/) A CSVW-described dataset is basically equivalent to a Frictionless DataPackage where all [Data Resources](https://specs.frictionlessdata.io/data-resource/) are [Tabular Data](https://specs.frictionlessdata.io/tabular-data-resource/). Thus, the `csvw` package provides some conversion functionality. To "read CSVW data from a Data Package", there's the `csvw.TableGroup.from_frictionless_datapackage` method: ```python from csvw import TableGroup tg = TableGroup.from_frictionless_datapackage('PATH/TO/datapackage.json') ``` To convert the metadata, the `TableGroup` can then be serialzed: ```python tg.to_file('csvw-metadata.json') ``` Note that the CSVW metadata file must be written to the Data Package's directory to make sure relative paths to data resources work. This functionality - together with the schema inference capabilities of [`frictionless describe`](https://framework.frictionlessdata.io/docs/guides/describing-data/) - provides a convenient way to bootstrap CSVW metadata for a set of "raw" CSV files, implemented in the [`csvwdescribe` command described above](#csvwdescribe). ## See also - https://www.w3.org/2013/csvw/wiki/Main_Page - https://csvw.org - https://github.com/CLARIAH/COW - https://github.com/CLARIAH/ruminator - https://github.com/bloomberg/pycsvw - https://specs.frictionlessdata.io/table-schema/ - https://github.com/theodi/csvlint.rb - https://github.com/ruby-rdf/rdf-tabular - https://github.com/rdf-ext/rdf-parser-csvw - https://github.com/Robsteranium/csvwr ## License This package is distributed under the [Apache 2.0 license](https://opensource.org/licenses/Apache-2.0).