Thank you. I read the W3C recommendation, as well as the referenced
documents. I drafted a comparison here:
https://github.com/pmitros/tsvx/blob/master/doc_source/related_formats.md
I think the standards are trying to do something a bit different, and are
actually pretty complementary. tsvx is designed to facilitate compatibility
between applications for internal data analysis and BI work. It is a
prescriptive standard. It says how files ought to be escaped and formatted.
The W3C CSV for the Web group appears to be doing exactly what name implies
-- provide descriptive metadata for public redistribution of datasets on
the web, especially for use on the semantic web. It is a descriptive
standard designed to work with all essentially all tabular data files. A
tsvx file could certainly be described with the W3C metadata if the
intention were external distribution.
Just to give the types of use cases I have internally:
- I have pipelines where I might have a dozen TSV files generated by
scripts working on data from MySQL, Vertica, and spreadsheets, all feeding
back to create reports. Before I switched to tsvx, scripts were brittle to
fairly modest format changes (e.g. adding a column), and had a bunch of
unnecessary logic parsing data types.
- Each time I import something into a tool I didn't create, I need to
click through a dialog letting it know what the delimiter is, and in
LibreOffice, reformat column types.
Adding W3C metadata files would add overhead for this type of work, rather
than reducing it, and would only provide benefit at the stage of the final
results.
Piotr
On Thu, Nov 3, 2016 at 10:35 AM, Eike Rathke <erack@redhat.com> wrote:
Hi Piotr,
On Thursday, 2016-11-03 08:08:23 -0400, Piotr Mitros wrote:
I do a fair bit of work where I move data between LibreOffice, MySQL,
Vertica, Google Docs, Hadoop, Python, and a few other systems. The
formatting of TSV files is ad-hoc. Each system has little differences in
how strings are escaped, and similar. In addition, there is no way to
preserve metadata.
I drafted a modest proposed spec for standardizing TSV files by
standardizing types, and adding metadata, and was hoping to solicit
feedback on that proposal:
http://www.tsvx.org/
It seems to me you're attempting to reinvent a wheel. I suggest you take
a look at https://www.w3.org/standards/techs/csv and maybe
https://www.w3.org/community/csvw/
Eike
--
LibreOffice Calc developer. Number formatter stricken i18n
transpositionizer.
GPG key "ID" 0x65632D3A - 2265 D7F3 A7B0 95CC 3918 630B 6A6C D5B7 6563
2D3A
Better use 64-bit 0x6A6CD5B765632D3A here is why: https://evil32.com/
Care about Free Software, support the FSFE https://fsfe.org/support/?erack
Context
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.