Kaydet (Commit) 2049e55f authored tarafından Damjan Jovanovic's avatar Damjan Jovanovic Kaydeden (comit) Eike Rathke

Make CSV line parsers consistent with CSV field parsers.

Our CSV field parsing algorithms treats fields starting with a quote
(immediately at the beginning of the row, or after the field delimiter) as
quoted. A quoted field ends at the corresponding closing quote, and any
remaining text between the closing quote and the next field delimeter or end
of line is appended to the text already extracted from the field, but not
processed further. Any quotes in this extra text are taken verbatim - they
do not quote anything.

Our CSV line parsers were big hacks - they essentially read and concatenate
lines until an even number of quote characters is found, and then feed this
through the CSV field parsers.

This patch rewrites the line parsers to work exactly how the field parsers
work. Text such as:
"another" ",something else
is now correctly parsed by both Calc and Base as:
[another "],[something else]
instead of breaking all further parsing.

Patch by: me

(cherry picked from commit 60e93b8b)

Change-Id: Iced60fad9371e17a2e5640cd7169804b18cf5103
Reviewed-on: https://gerrit.libreoffice.org/24999Tested-by: 's avatarJenkins <ci@libreoffice.org>
Reviewed-by: 's avatarEike Rathke <erack@redhat.com>
Tested-by: 's avatarEike Rathke <erack@redhat.com>
üst d94b827c
......@@ -890,14 +890,61 @@ bool OFlatTable::readLine(sal_Int32 * const pEndPos, sal_Int32 * const pStartPos
return false;
QuotedTokenizedString sLine = m_aCurrentLine; // check if the string continues on next line
while( (comphelper::string::getTokenCount(sLine.GetString(), m_cStringDelimiter) % 2) != 1 )
sal_Int32 nLastOffset = 0;
bool isQuoted = false;
bool isFieldStarting = true;
while (true)
{
m_pFileStream->ReadByteStringLine(sLine,nEncoding);
if ( !m_pFileStream->IsEof() )
bool wasQuote = false;
const sal_Unicode *p = sLine.GetString().getStr() + nLastOffset;
while (*p)
{
OUString aStr = m_aCurrentLine.GetString() + "\n" + sLine.GetString();
m_aCurrentLine.SetString(aStr);
sLine = m_aCurrentLine;
if (isQuoted)
{
if (*p == m_cStringDelimiter)
wasQuote = !wasQuote;
else
{
if (wasQuote)
{
wasQuote = false;
isQuoted = false;
if (*p == m_cFieldDelimiter)
isFieldStarting = true;
}
}
}
else
{
if (isFieldStarting)
{
isFieldStarting = false;
if (*p == m_cStringDelimiter)
isQuoted = true;
else if (*p == m_cFieldDelimiter)
isFieldStarting = true;
}
else if (*p == m_cFieldDelimiter)
isFieldStarting = true;
}
++p;
}
if (wasQuote)
isQuoted = false;
if (isQuoted)
{
nLastOffset = sLine.Len();
m_pFileStream->ReadByteStringLine(sLine,nEncoding);
if ( !m_pFileStream->IsEof() )
{
OUString aStr = m_aCurrentLine.GetString() + "\n" + sLine.GetString();
m_aCurrentLine.SetString(aStr);
sLine = m_aCurrentLine;
}
else
break;
}
else
break;
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment