Date: prev next · Thread: first prev next last
2016 Archives by date, by thread · List index


Thank you. I read the W3C recommendation, as well as the referenced
documents. I drafted a comparison here:
  https://github.com/pmitros/tsvx/blob/master/doc_source/related_formats.md

I think the standards are trying to do something a bit different, and are
actually pretty complementary. tsvx is designed to facilitate compatibility
between applications for internal data analysis and BI work. It is a
prescriptive standard. It says how files ought to be escaped and formatted.
The W3C CSV for the Web group appears to be doing exactly what name implies
-- provide descriptive metadata for public redistribution of datasets on
the web, especially for use on the semantic web. It is a descriptive
standard designed to work with all essentially all tabular data files. A
tsvx file could certainly be described with the W3C metadata if the
intention were external distribution.

Just to give the types of use cases I have internally:

   - I have pipelines where I might have a dozen TSV files generated by
   scripts working on data from MySQL, Vertica, and spreadsheets, all feeding
   back to create reports. Before I switched to tsvx, scripts were brittle to
   fairly modest format changes (e.g. adding a column), and had a bunch of
   unnecessary logic parsing data types.
   - Each time I import something into a tool I didn't create, I need to
   click through a dialog letting it know what the delimiter is, and in
   LibreOffice, reformat column types.

Adding W3C metadata files would add overhead for this type of work, rather
than reducing it, and would only provide benefit at the stage of the final
results.

Piotr

On Thu, Nov 3, 2016 at 10:35 AM, Eike Rathke <erack@redhat.com> wrote:

Hi Piotr,

On Thursday, 2016-11-03 08:08:23 -0400, Piotr Mitros wrote:

I do a fair bit of work where I move data between LibreOffice, MySQL,
Vertica, Google Docs, Hadoop, Python, and a few other systems. The
formatting of TSV files is ad-hoc. Each system has little differences in
how strings are escaped, and similar. In addition, there is no way to
preserve metadata.

I drafted a modest proposed spec for standardizing TSV files by
standardizing types, and adding metadata, and was hoping to solicit
feedback on that proposal:

http://www.tsvx.org/

It seems to me you're attempting to reinvent a wheel. I suggest you take
a look at https://www.w3.org/standards/techs/csv and maybe
https://www.w3.org/community/csvw/

  Eike

--
LibreOffice Calc developer. Number formatter stricken i18n
transpositionizer.
GPG key "ID" 0x65632D3A - 2265 D7F3 A7B0 95CC 3918  630B 6A6C D5B7 6563
2D3A
Better use 64-bit 0x6A6CD5B765632D3A here is why: https://evil32.com/
Care about Free Software, support the FSFE https://fsfe.org/support/?erack


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.