302 lines
9.9 KiB
Plaintext
302 lines
9.9 KiB
Plaintext
Metadata-Version: 2.1
|
|
Name: csvw
|
|
Version: 3.5.1
|
|
Summary: Python library to work with CSVW described tabular data
|
|
Home-page: https://github.com/cldf/csvw
|
|
Author: Robert Forkel
|
|
Author-email: robert_forkel@eva.mpg.de
|
|
License: Apache 2.0
|
|
Project-URL: Bug Tracker, https://github.com/cldf/csvw/issues
|
|
Keywords: csv,w3c,tabular-data
|
|
Platform: any
|
|
Classifier: Development Status :: 5 - Production/Stable
|
|
Classifier: Intended Audience :: Developers
|
|
Classifier: Intended Audience :: Science/Research
|
|
Classifier: Natural Language :: English
|
|
Classifier: Operating System :: OS Independent
|
|
Classifier: Programming Language :: Python :: 3
|
|
Classifier: Programming Language :: Python :: 3.8
|
|
Classifier: Programming Language :: Python :: 3.9
|
|
Classifier: Programming Language :: Python :: 3.10
|
|
Classifier: Programming Language :: Python :: 3.11
|
|
Classifier: Programming Language :: Python :: 3.12
|
|
Classifier: Programming Language :: Python :: 3.13
|
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
|
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
|
Classifier: License :: OSI Approved :: Apache Software License
|
|
Requires-Python: >=3.8
|
|
Description-Content-Type: text/markdown
|
|
License-File: LICENSE
|
|
Requires-Dist: attrs>=18.1
|
|
Requires-Dist: isodate
|
|
Requires-Dist: python-dateutil
|
|
Requires-Dist: rfc3986<2
|
|
Requires-Dist: uritemplate>=3.0.0
|
|
Requires-Dist: babel
|
|
Requires-Dist: requests
|
|
Requires-Dist: language-tags
|
|
Requires-Dist: rdflib
|
|
Requires-Dist: colorama
|
|
Requires-Dist: jsonschema
|
|
Provides-Extra: dev
|
|
Requires-Dist: flake8; extra == "dev"
|
|
Requires-Dist: wheel; extra == "dev"
|
|
Requires-Dist: twine; extra == "dev"
|
|
Requires-Dist: build; extra == "dev"
|
|
Provides-Extra: docs
|
|
Requires-Dist: sphinx<7; extra == "docs"
|
|
Requires-Dist: sphinx-autodoc-typehints; extra == "docs"
|
|
Requires-Dist: sphinx-rtd-theme; extra == "docs"
|
|
Provides-Extra: test
|
|
Requires-Dist: frictionless; extra == "test"
|
|
Requires-Dist: pytest>=5; extra == "test"
|
|
Requires-Dist: pytest-mock; extra == "test"
|
|
Requires-Dist: requests-mock; extra == "test"
|
|
Requires-Dist: pytest-cov; extra == "test"
|
|
|
|
# csvw
|
|
|
|
[](https://github.com/cldf/csvw/actions?query=workflow%3Atests)
|
|
[](https://pypi.org/project/csvw)
|
|
[](https://csvw.readthedocs.io/en/latest/?badge=latest)
|
|
|
|
|
|
This package provides
|
|
- a Python API to read and write relational, tabular data according to the [CSV on the Web](https://csvw.org/) specification and
|
|
- commandline tools for reading and validating CSVW data.
|
|
|
|
|
|
## Links
|
|
|
|
- GitHub: https://github.com/cldf/csvw
|
|
- PyPI: https://pypi.org/project/csvw
|
|
- Issue Tracker: https://github.com/cldf/csvw/issues
|
|
|
|
|
|
## Installation
|
|
|
|
This package runs under Python >=3.8, use pip to install:
|
|
|
|
```bash
|
|
$ pip install csvw
|
|
```
|
|
|
|
|
|
## CLI
|
|
|
|
### `csvw2json`
|
|
|
|
Converting CSVW data [to JSON](https://www.w3.org/TR/csv2json/)
|
|
|
|
```shell
|
|
$ csvw2json tests/fixtures/zipped-metadata.json
|
|
{
|
|
"tables": [
|
|
{
|
|
"url": "tests/fixtures/zipped.csv",
|
|
"row": [
|
|
{
|
|
"url": "tests/fixtures/zipped.csv#row=2",
|
|
"rownum": 1,
|
|
"describes": [
|
|
{
|
|
"ID": "abc",
|
|
"Value": "the value"
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"url": "tests/fixtures/zipped.csv#row=3",
|
|
"rownum": 2,
|
|
"describes": [
|
|
{
|
|
"ID": "cde",
|
|
"Value": "another one"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### `csvwvalidate`
|
|
|
|
Validating CSVW data
|
|
|
|
```shell
|
|
$ csvwvalidate tests/fixtures/zipped-metadata.json
|
|
OK
|
|
```
|
|
|
|
### `csvwdescribe`
|
|
|
|
Describing tabular-data files with CSVW metadata
|
|
|
|
```shell
|
|
$ csvwdescribe --delimiter "|" tests/fixtures/frictionless-data.csv
|
|
{
|
|
"@context": "http://www.w3.org/ns/csvw",
|
|
"dc:conformsTo": "data-package",
|
|
"tables": [
|
|
{
|
|
"dialect": {
|
|
"delimiter": "|"
|
|
},
|
|
"tableSchema": {
|
|
"columns": [
|
|
{
|
|
"datatype": "string",
|
|
"name": "FK"
|
|
},
|
|
{
|
|
"datatype": "integer",
|
|
"name": "Year"
|
|
},
|
|
{
|
|
"datatype": "string",
|
|
"name": "Location name"
|
|
},
|
|
{
|
|
"datatype": "string",
|
|
"name": "Value"
|
|
},
|
|
{
|
|
"datatype": "string",
|
|
"name": "binary"
|
|
},
|
|
{
|
|
"datatype": "string",
|
|
"name": "anyURI"
|
|
},
|
|
{
|
|
"datatype": "string",
|
|
"name": "email"
|
|
},
|
|
{
|
|
"datatype": "string",
|
|
"name": "boolean"
|
|
},
|
|
{
|
|
"datatype": {
|
|
"dc:format": "application/json",
|
|
"base": "json"
|
|
},
|
|
"name": "array"
|
|
},
|
|
{
|
|
"datatype": {
|
|
"dc:format": "application/json",
|
|
"base": "json"
|
|
},
|
|
"name": "geojson"
|
|
}
|
|
]
|
|
},
|
|
"url": "tests/fixtures/frictionless-data.csv"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
|
|
## Python API
|
|
|
|
Find the Python API documentation at [csvw.readthedocs.io](https://csvw.readthedocs.io/en/latest/).
|
|
|
|
A quick example for using `csvw` from Python code:
|
|
|
|
```python
|
|
import json
|
|
from csvw import CSVW
|
|
data = CSVW('https://raw.githubusercontent.com/cldf/csvw/master/tests/fixtures/test.tsv')
|
|
print(json.dumps(data.to_json(minimal=True), indent=4))
|
|
[
|
|
{
|
|
"province": "Hello",
|
|
"territory": "world",
|
|
"precinct": "1"
|
|
}
|
|
]
|
|
```
|
|
|
|
|
|
## Known limitations
|
|
|
|
- We read **all** data which is specified as UTF-8 encoded using the
|
|
[`utf-8-sig` codecs](https://docs.python.org/3/library/codecs.html#module-encodings.utf_8_sig).
|
|
Thus, if such data starts with `U+FEFF` this will be interpreted as [BOM](https://en.wikipedia.org/wiki/Byte_order_mark)
|
|
and skipped.
|
|
- Low level CSV parsing is delegated to the `csv` module in Python's standard library. Thus, if a `commentPrefix`
|
|
is specified in a `Dialect` instance, this will lead to skipping rows where the first value starts
|
|
with `commentPrefix`, **even if the value was quoted**.
|
|
- Also, cell content containing `escapechar` may not be round-tripped as expected (when specifying
|
|
`escapechar` or a `csvw.Dialect` with `quoteChar` but `doubleQuote==False`),
|
|
when minimal quoting is specified. This is due to inconsistent `csv` behaviour
|
|
across Python versions (see https://bugs.python.org/issue44861).
|
|
|
|
|
|
## CSVW conformance
|
|
|
|
While we use the CSVW specification as guideline, this package does not (and
|
|
probably never will) implement the full extent of this spec.
|
|
|
|
- When CSV files with a header are read, columns are not matched in order with
|
|
column descriptions in the `tableSchema`, but instead are matched based on the
|
|
CSV column header and the column descriptions' `name` and `titles` atributes.
|
|
This allows for more flexibility, because columns in the CSV file may be
|
|
re-ordered without invalidating the metadata. A stricter matching can be forced
|
|
by specifying `"header": false` and `"skipRows": 1` in the table's dialect
|
|
description.
|
|
|
|
However, `csvw.CSVW` works correctly for
|
|
- 269 out of 270 [JSON tests](https://w3c.github.io/csvw/tests/#manifest-json),
|
|
- 280 out of 282 [validation tests](https://w3c.github.io/csvw/tests/#manifest-validation),
|
|
- 10 out of 18 [non-normative tests](https://w3c.github.io/csvw/tests/#manifest-nonnorm)
|
|
|
|
from the [CSVW Test suites](https://w3c.github.io/csvw/tests/).
|
|
|
|
|
|
## Compatibility with [Frictionless Data Specs](https://specs.frictionlessdata.io/)
|
|
|
|
A CSVW-described dataset is basically equivalent to a Frictionless DataPackage where all
|
|
[Data Resources](https://specs.frictionlessdata.io/data-resource/) are [Tabular Data](https://specs.frictionlessdata.io/tabular-data-resource/).
|
|
Thus, the `csvw` package provides some conversion functionality. To
|
|
"read CSVW data from a Data Package", there's the `csvw.TableGroup.from_frictionless_datapackage` method:
|
|
```python
|
|
from csvw import TableGroup
|
|
tg = TableGroup.from_frictionless_datapackage('PATH/TO/datapackage.json')
|
|
```
|
|
To convert the metadata, the `TableGroup` can then be serialzed:
|
|
```python
|
|
tg.to_file('csvw-metadata.json')
|
|
```
|
|
|
|
Note that the CSVW metadata file must be written to the Data Package's directory
|
|
to make sure relative paths to data resources work.
|
|
|
|
This functionality - together with the schema inference capabilities
|
|
of [`frictionless describe`](https://framework.frictionlessdata.io/docs/guides/describing-data/) - provides
|
|
a convenient way to bootstrap CSVW metadata for a set of "raw" CSV
|
|
files, implemented in the [`csvwdescribe` command described above](#csvwdescribe).
|
|
|
|
|
|
## See also
|
|
|
|
- https://www.w3.org/2013/csvw/wiki/Main_Page
|
|
- https://csvw.org
|
|
- https://github.com/CLARIAH/COW
|
|
- https://github.com/CLARIAH/ruminator
|
|
- https://github.com/bloomberg/pycsvw
|
|
- https://specs.frictionlessdata.io/table-schema/
|
|
- https://github.com/theodi/csvlint.rb
|
|
- https://github.com/ruby-rdf/rdf-tabular
|
|
- https://github.com/rdf-ext/rdf-parser-csvw
|
|
- https://github.com/Robsteranium/csvwr
|
|
|
|
|
|
## License
|
|
|
|
This package is distributed under the [Apache 2.0 license](https://opensource.org/licenses/Apache-2.0).
|