• Stephan Bergmann's avatar
    Don't read past end of string in Guess ctor · 31a8d9c7
    Stephan Bergmann yazdı
    <https://ci.libreoffice.org//job/lo_ubsan/1082/>:
    > ==26422==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x604000accc72 at pc 0x2ae43e63f4b6 bp 0x2ae43e600510 sp 0x2ae43e600508
    > READ of size 1 at 0x604000accc72 thread T70 (cppu_threadpool)
    >     #0 0x2ae43e63f4b5 in Guess::Guess(char const*) /lingucomponent/source/languageguessing/guess.cxx:95:28
    >     #1 0x2ae43e667f2f in SimpleGuesser::GetManagedLanguages(char) /lingucomponent/source/languageguessing/simpleguesser.cxx:169:19
    >     #2 0x2ae43e668420 in SimpleGuesser::GetAvailableLanguages() /lingucomponent/source/languageguessing/simpleguesser.cxx:179:12
    >     #3 0x2ae43e64a18e in LangGuess_Impl::getEnabledLanguages() /lingucomponent/source/languageguessing/guesslang.cxx:229:24
    [...]
    > 0x604000accc72 is located 0 bytes to the right of 34-byte region [0x604000accc50,0x604000accc72)
    > allocated by thread T70 (cppu_threadpool) here:
    [...]
    >     #7 0x2ae43e65350a in std::string::operator+=(char const*) /home/tdf/lode/opt_private/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/../../../../include/c++/5.2.0/bits/basic_string.h:3355:16
    >     #8 0x2ae43e667e6e in SimpleGuesser::GetManagedLanguages(char) /lingucomponent/source/languageguessing/simpleguesser.cxx:168:21
    >     #9 0x2ae43e668420 in SimpleGuesser::GetAvailableLanguages() /lingucomponent/source/languageguessing/simpleguesser.cxx:179:12
    >     #10 0x2ae43e64a18e in LangGuess_Impl::getEnabledLanguages() /lingucomponent/source/languageguessing/guesslang.cxx:229:24
    [...]
    
    shows, during UITest_librelogo, the Guess ctor making wrong assumptions about
    the structure of guess_str and skipping over the terminating NUL.  Locally I
    could see that while most inputs do have the expected syntax of starting with
    "[" and containing two "-", one input is indeed just "[haw-utf8" without a
    second "-".
    
    I don't know where the strings passed into the Guess ctor in the two places in
    lingucomponent/source/languageguessing/simpleguesser.cxx ultimately come from,
    and what their guaranteed syntax and their semantics is.  So from the existing
    code and the non--well-formed "[haw-utf8" sample (where the second segment shall
    apparently designate an encoding, not a country), construct rules how to
    robustly parse any input into potential language/country/encoding parts.  (What
    is obvious from the call sites is that for one each input will start with "[",
    and for another the item to parse need neither be "]"- nor NUL-terminated.)
    
    (Guess::encoding_str and the local enc variable have effectively been unused
    ever since the code's introduction in 07628119
    "INTEGRATION: CWS languageguessing".  Guess::encoding_str, but not the local
    enc variable, got later removed with b275246c
    "loplugin:unusedfields in formula..registry".)
    
    Change-Id: Icbedc05ed5b119ee4efbc3118cc17076a4d80c74
    Reviewed-on: https://gerrit.libreoffice.org/62390
    Tested-by: Jenkins
    Reviewed-by: 's avatarStephan Bergmann <sbergman@redhat.com>
    31a8d9c7
Adı
Son kayıt (commit)
Son güncelleme
..
guess.cxx Loading commit data...
guess.hxx Loading commit data...
guesslang.component Loading commit data...
guesslang.cxx Loading commit data...
simpleguesser.cxx Loading commit data...
simpleguesser.hxx Loading commit data...