Date: prev next · Thread: first prev next last
2014 Archives by date, by thread · List index


Hello,

The use of cppcheck-htmlreport to convert raw cppcheck reports errors to
html fails for some files because of the encodings.
Here's an example message:
cppcheck/htmlreport/cppcheck-htmlreport", line 287, in <module>
    content = input_file.read()
  File "/usr/lib/python2.7/codecs.py", line 296, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 3546:
invalid start byte

Here's the list of files which give this problem:
./hwpfilter/source/hcode.cxx
./hwpfilter/source/hwpread.cxx
./hwpfilter/source/hbox.h
./hwpfilter/source/formula.cxx
./hwpfilter/source/hwpfile.cxx
./hwpfilter/source/hwpeq.cxx
./chart2/source/view/charttypes/Splines.cxx (was containing 2 "ü" but was
detected as iso-8859-1 and not as utf8 by "file -i"), now converted (see
http://cgit.freedesktop.org/libreoffice/core/commit/?id=42d494e7925249c36f62206e7268d849437e219d)
./hwpfilter/source/hbox.cxx
./hwpfilter/source/hinfo.cxx

I gave a try to ./hwpfilter/source/hinfo.cxx
Initial view on vi (Debian testing x86-64, French)
     56 /**
     57  * ¹®¼­Á¤º¸¸¦ ÀоîµéÀÌ´Â ÇÔ¼ö ( 128 bytes )
     58  * ¹®¼­Á¤º¸´Â ÆÄÀÏÀνÄÁ¤º¸( 30 bytes ) ´ÙÀ½¿¡ À§Ä¡ÇÑ Á¤º¸ÀÌ´Ù.
     59  */
     60 bool HWPInfo::Read(HWPFile & hwpf)

since README from hwpfilter indicates "Hangul Word Processor" and "Korea", I
gave a try with "iconv -f EUC-KR -t utf8 hwpfilter/source/hinfo.cxx >
stdout.txt", I retrieved this:
      56 /**
     57  * 문서정보를 읽어들이는 함수 ( 128 bytes )
     58  * 문서정보는 파일인식정보( 30 bytes ) 다음에 위치한 정보이다.
     59  */
     60 bool HWPInfo::Read(HWPFile & hwpf)

I gave a try to Google translate which detected the language as Korean
(hopefully! :-)) and translated this:
"Function to read the document information"
which seems ok according to the name of the function.
Remark : I don't know what means "( 128 bytes )" or "( 30 bytes)", is it a
pb in conversion?

Anyway, would this conversion be ok on these files or might we lose some
information?
Of course, I prefer cppcheck to fail the html conversion of some reports
than losing important information in these files.
Perhaps too, it's a cppcheck bug or Python bug which should be fixed.

Any idea?

Julien



--
View this message in context: 
http://nabble.documentfoundation.org/What-encoding-is-used-tp4105106.html
Sent from the Dev mailing list archive at Nabble.com.

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.