Skip to content

SV PV COW 255 doesn't support strings under 9 SvCUR() bytes long #23261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bulk88 opened this issue May 5, 2025 · 3 comments
Open

SV PV COW 255 doesn't support strings under 9 SvCUR() bytes long #23261

bulk88 opened this issue May 5, 2025 · 3 comments

Comments

@bulk88
Copy link
Contributor

bulk88 commented May 5, 2025

Description
from sv.c

#ifndef SV_COW_MAX_WASTE_FACTOR_THRESHOLD
#    define SV_COW_MAX_WASTE_FACTOR_THRESHOLD   2    /* COW iff len < (cur * K) */
#endif
#ifndef SV_COWBUF_WASTE_FACTOR_THRESHOLD
#    define SV_COWBUF_WASTE_FACTOR_THRESHOLD    2    /* COW iff len < (cur * K) */
#endif

8 bytes or smaller strings can't 255 COW because of ^^^^^ added in 5.19.12 in

#13800
e8c6a47

Author: Yves Orton [email protected]
Date: 5/11/2014 6:37:33 AM
Message:
Implement "max waste" thresholds to avoid problems with COW and deliberately overallocated pvs
https://rt.perl.org/Ticket/Display.html?id=121796

Steps to Reproduce


use Devel::Peek;
 #use Inline  ('force', 'noclean');

use Inline C => Config =>
 PRE_HEAD => '#define PERL_NO_GET_CONTEXT 1';
use Inline C => Config => BUILD_NOISY => 1;

use Inline C => <<'END_OF_C_CODE';


SV* do3(SV* ssv) {
 dTHX;
 U32 flags = 0;
    SV* dsv;
    if ((SvFLAGS(ssv) & ((SVf_OK|SVs_GMG) &~(SVp_NOK))) == SVf_NOK)
        dsv = newSVnv(SvNVX(ssv));
    else if ((SvFLAGS(ssv) & ((SVf_OK|SVs_GMG) &~(SVp_IOK))) == SVf_IOK) {
        IV iv = SvIVX(ssv);
        dsv = SvUOK(ssv) ? newSVuv((UV)iv) : newSViv(iv);
    }
    else {
        U32 type = SvPOK(ssv) ? SVt_PV : SvNOK(ssv) ? SVt_NV : SVt_IV;
        if (flags & SVs_TEMP)
            dsv = newSV_type_mortal(type);
        else
            dsv = newSV_type(type);
        sv_dump_depth(dsv, 3);
        sv_dump_depth(ssv, 3);
        sv_setsv_flags(dsv, ssv, SV_GMAGIC | SV_NOSTEAL
            | SV_COW_SHARED_HASH_KEYS | SV_COW_OTHER_PVS | SV_DO_COW_SVSETSV);
        sv_dump_depth(ssv, 3);
        sv_dump_depth(dsv, 3);
        return dsv;
    }
    if (flags & SVs_TEMP)
        dsv = sv_2mortal(dsv);
    return dsv;
}


END_OF_C_CODE


$, = "\n";

use v5.30;
use strict;
use warnings;
#$DB::single = 1;

#$DB::single = 1;

use version;
my $s1;
my $s2;
my $s3;
my $s4;
warn "\nwith 8 bytes\n\n";
$s1 = substr(int(rand(10)),0,1).'.345678';
$s2 = do3($s1);
warn "\nnow with 9 bytes\n\n";
$s1 = substr(int(rand(10)),0,1).'.3456789';
$s3 = do3($s1);
warn "\nwith 4 bytes\n\n";
$s1 = substr(int(rand(10)),0,1).'.34';
$s4 = do3($s1);
with 8 bytes

SV = PV(0x2abb18) at 0x2ab0f0
  REFCNT = 1
  FLAGS = ()
  PV = 0
SV = PV(0x2abae8) at 0x244ce88
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x231e048 "4.345678"\0
  CUR = 8
  LEN = 16
SV = PV(0x2abae8) at 0x244ce88
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x231e048 "4.345678"\0
  CUR = 8
  LEN = 16
SV = PV(0x2abb18) at 0x2ab0f0
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x2496e98 "4.345678"\0
  CUR = 8
  LEN = 16

now with 9 bytes

SV = PV(0x2abb48) at 0x2ab0f0
  REFCNT = 1
  FLAGS = ()
  PV = 0
SV = PV(0x2abae8) at 0x244ce88
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x231e048 "7.3456789"\0
  CUR = 9
  LEN = 16
SV = PV(0x2abae8) at 0x244ce88
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x231e048 "7.3456789"\0
  CUR = 9
  LEN = 16
  COW_REFCNT = 1
SV = PV(0x2abb48) at 0x2ab0f0
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x231e048 "7.3456789"\0
  CUR = 9
  LEN = 16
  COW_REFCNT = 1

with 4 bytes

SV = PV(0x2abb68) at 0x2ab0f0
  REFCNT = 1
  FLAGS = ()
  PV = 0
SV = PV(0x2abae8) at 0x244ce88
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x2495098 "8.34"\0
  CUR = 4
  LEN = 16
SV = PV(0x2abae8) at 0x244ce88
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x2495098 "8.34"\0
  CUR = 4
  LEN = 16
SV = PV(0x2abb68) at 0x2ab0f0
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x2497018 "8.34"\0
  CUR = 4
  LEN = 16

Expected behavior
That tiny strings can SVPV 255 COW. 8 bytes and under are often tokens/fields/keys/prop names/string-ified integers/stringified 1 or 0 bools/api func calls or var names/json prop names. All of those are very high frequency to see bounce around at runtime on end user code.

Perl_sv_grow()/PERL_STRLEN_ROUNDUP/PERL_STRLEN_EXPAND_SHIFT/PERL_STRLEN_NEW_MIN/newSVpvn/etc should not be discriminating against short strings held in Perl API's Newx() backed, API min buf length of 0x10==SvLEN().

Perl configuration

Summary of my perl5 (revision 5 version 41 subversion 7) configuration:
  Derived from: 73172a67eaae5671dffc06b427f005810d151472
  Platform:
    osname=MSWin32
    osvers=6.1.7601
    archname=MSWin32-x64-multi-thread
    uname=''
    config_args='undef'
    hint=recommended
    useposix=true
    d_sigaction=undef
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=undef
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='cl'
    ccflags ='-nologo -GF -W3 -MD -TC -DWIN32 -D_CONSOLE -DNO_STRICT -DWIN64 -D_
CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -D_WINSOCK_DEPRECATED_NO_WAR
NINGS -DPERL_TEXTMODE_SCRIPTS -DMULTIPLICITY -DPERL_IMPLICIT_SYS -DWIN32_NO_REGI
STRY -DUSE_PERLIO'
    optimize='-O1 -Zi -GL -fp:precise'
    cppflags='-DWIN32'
    ccversion='19.36.32535'
    gccversion=''
    gccosandvers=''
    intsize=4
    longsize=4
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=undef
    longlongsize=8
    d_longdbl=define
    longdblsize=8
    longdblkind=0
    ivtype='__int64'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='__int64'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='link'
    ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -ltcg -libpath:"c:\pb64\
lib\CORE" -machine:AMD64 -subsystem:console,"5.02"'
    libpth="C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSV
C\14.36.32532\lib\x64"
    libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.li
b advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_32.l
ib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
 vcruntime.lib ucrt.lib
    perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg3
2.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_
32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt
.lib vcruntime.lib ucrt.lib
    libc=ucrt.lib
    so=dll
    useshrplib=true
    libperl=perl541.lib
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs
    dlext=dll
    d_dlsymun=undef
    ccdlflags=' '
    cccdlflags=' '
    lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf -ltcg -libpath:"c:
\pb64\lib\CORE" -machine:AMD64 -subsystem:console,"5.02"'


Characteristics of this binary (from libperl):
  Compile-time options:
    HAS_LONG_DOUBLE
    HAS_TIMES
    HAVE_INTERP_INTERN
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_HASH_FUNC_SIPHASH13
    PERL_HASH_USE_SBOX32
    PERL_IMPLICIT_SYS
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_SAFE_PUTENV
    USE_64_BIT_INT
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_NO_REGISTRY
    USE_PERLIO
    USE_PERL_ATOF
    USE_THREAD_SAFE_LOCALE
  Locally applied patches:
    uncommitted-changes
  Built under MSWin32
  Compiled at Dec 20 2024 10:03:46
  %ENV:
    PERL_JSON_BACKEND="Cpanel::JSON::XS"
    PERL_YAML_BACKEND="YAML::XS"
  @INC:
    C:/pb64/site/lib/MSWin32-x64-multi-thread
    C:/pb64/site/lib
    C:/pb64/lib
@khwilliamson
Copy link
Contributor

I don't understand why COW is advantageous on small strings.

@bulk88
Copy link
Contributor Author

bulk88 commented May 9, 2025

I don't understand why COW is advantageous on small strings.

Do you want 500 or 1000 copies of the 5 letter string "undef" in ur perl process? Plus round to 16 bytes plus 8 or 16 bytes for malloc internals, plus 24 bytes for threaded win perl?

How about hv_intrnext() and list to list assignment, 500 or 1000 more copies of 5 byte string "undef", how many times is perl supposed to malloc()/free() cycle a 1 byte long string like "1" or "undef"? It's a performance problem.

A profiling tool says the pvx buffer of newSVpvs(""); on win64 cost 70 bytes. Most allocs in a perl vm on the bellcurve are mid 70s bytes peaking at 82-85 bytes. So yes COW for strings 8 bytes and under are very necessary. They cost TEN times their own length in final memory cost. IdK why on win64 70 bytes is the lowest alloc perl makes, it's not a unit of eight so I'm guessing the profiler has private access to the windows malloc system and windows keeps 8 bytes behind the pointer, plus a secret 4 bytes somewhere else in address space of overhead and the other four bytes are anti-security exploit intergrity metadata.

@tonycoz
Copy link
Contributor

tonycoz commented May 15, 2025

8 bytes or smaller strings can't 255 COW because of ^^^^^ added in 5.19.12 in

$ perl -MDevel::Peek -e 'Dump("Hello")'
SV = PV(0x55a3bb5c8f10) at 0x55a3bb5f4df8
  REFCNT = 1
  FLAGS = (POK,IsCOW,READONLY,PROTECT,pPOK)
  PV = 0x55a3bb5fd5f0 "Hello"\0
  CUR = 5
  LEN = 10
  COW_REFCNT = 0
$ perl -MDevel::Peek -e 'my $x = "Hello"; Dump($x)'
SV = PV(0x55afb6cebea0) at 0x55afb6d17ff0
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x55afb6d205f0 "Hello"\0
  CUR = 5
  LEN = 10
  COW_REFCNT = 1

though this is a special case for literals and copies of them.

From memory when the current CoW was introduced CoWing short strings ended up slower than doing a normal copy, YMMV for systems with a slow malloc() (looking at a certain popular non-POSIX platform).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants