Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress the compiled bytecode. #18959

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

silene
Copy link
Contributor

@silene silene commented Apr 19, 2024

The compression scheme is quite naive. It is based on the observation that, in most cases, a bytecode word is a small byte followed by three nul bytes. In that case, it is directly stored as a single byte. In the other cases, more bytes are used, up to 5 bytes to store a full word.

This brings a 4% reduction of the overall .vo size of the standard library, that is, a 2MB reduction.

@silene silene added kind: performance Improvements to performance and efficiency. part: VM Virtual machine. labels Apr 19, 2024
@silene silene requested a review from a team as a code owner April 19, 2024 16:42
@coqbot-app coqbot-app bot added the needs: full CI The latest GitLab pipeline that ran was a light CI. Say "@coqbot run full ci" to get a full CI. label Apr 19, 2024
@JasonGross
Copy link
Member

Can we (a) check the size reduction in .vo size across the entire CI somehow, and (b) get monogram statistics for the bytecode words across the entire CI so that we can order them optimally?

@SkySkimmer
Copy link
Contributor

@coqbot bench

This ensures that the two rightmost bytes are nul in the general case,
hence improving the compression ratio.
The compression scheme is quite naive. It is based on the observation
that, in most cases, a bytecode word is a small byte followed by three nul
bytes. In that case, it is directly stored as a single byte. In the other
cases, more bytes are used, up to 5 bytes to store a full word.

This brings a 4% reduction of the overall .vo size of the standard
library, that is, a 2MB reduction.
Copy link
Contributor

coqbot-app bot commented Apr 20, 2024

🏁 Bench results:

┌─────────────────────────────────────┬─────────────────────────┬───────────────────────────────────────┬───────────────────────────────────────┬─────────────────────────┐
│                                     │      user time [s]      │              CPU cycles               │           CPU instructions            │  max resident mem [KB]  │
│                                     │                         │                                       │                                       │                         │
│            package_name             │   NEW      OLD    PDIFF │      NEW             OLD        PDIFF │      NEW             OLD        PDIFF │   NEW      OLD    PDIFF │
├─────────────────────────────────────┼─────────────────────────┼───────────────────────────────────────┼───────────────────────────────────────┼─────────────────────────┤
│ coq-neural-net-interp-computed-lite │  290.98   296.24  -1.78 │  1327910853559   1353468034008  -1.89 │  3427598241248   3427134562489   0.01 │ 1158072  1124592   2.98 │
│               coq-mathcomp-fingroup │   30.53    30.94  -1.33 │   138820782892    140472186531  -1.18 │   209055512576    209084087942  -0.01 │  565508   564296   0.21 │
│                       coq-equations │    7.59     7.66  -0.91 │    31362864949     31488557656  -0.40 │    50995652771     51015248799  -0.04 │  391552   386408   1.33 │
│          coq-performance-tests-lite │  711.48   717.60  -0.85 │  3213440487315   3240467181184  -0.83 │  5651106856100   5647462151117   0.06 │ 1589304  1588016   0.08 │
│                        coq-coqprime │   48.23    48.62  -0.80 │   217015208974    218383707220  -0.63 │   334775789193    334728934351   0.01 │  789404   789600  -0.02 │
│                 coq-category-theory │  690.80   695.71  -0.71 │  3141977379544   3164082173672  -0.70 │  5259529627918   5260819330897  -0.02 │  958096   954408   0.39 │
│                coq-metacoq-template │  150.70   151.70  -0.66 │   669698431550    672699523578  -0.45 │  1047617659166   1047650521562  -0.00 │ 1512580  1512248   0.02 │
│                           coq-verdi │   49.09    49.38  -0.59 │   221011001589    222086375562  -0.48 │   341249794223    341305743004  -0.02 │  530600   530856  -0.05 │
│                coq-mathcomp-algebra │  235.76   237.09  -0.56 │  1075995770930   1081750029933  -0.53 │  1771908704536   1772068445848  -0.01 │ 1291276  1289132   0.17 │
│                  coq-mathcomp-field │  134.90   135.63  -0.54 │   615598539672    619058600530  -0.56 │  1015406324790   1015424793823  -0.00 │ 1411156  1408968   0.16 │
│             coq-metacoq-safechecker │  420.56   422.64  -0.49 │  1925366157137   1934456404513  -0.47 │  3197685957758   3197639789111   0.00 │ 2077108  2076720   0.02 │
│                             coq-vst │  878.71   882.65  -0.45 │  3991454827407   4011914432607  -0.51 │  6765990248010   6765629663265   0.01 │ 2150296  2149196   0.05 │
│                        coq-bedrock2 │  360.38   361.93  -0.43 │  1641664160532   1648366738175  -0.41 │  3101430767486   3103849558728  -0.08 │  912556   900684   1.32 │
│                       coq-fiat-core │   60.02    60.27  -0.41 │   251293892846    251294565162  -0.00 │   370014544316    369972969401   0.01 │  482880   482980  -0.02 │
│            coq-metacoq-translations │   17.03    17.10  -0.41 │    76017436142     76249327095  -0.30 │   124930828668    124904908551   0.02 │  845428   848256  -0.33 │
│                      coq-coquelicot │   39.93    40.09  -0.40 │   177180882908    177777286476  -0.34 │   250652183959    250560540700   0.04 │  857492   854452   0.36 │
│              coq-mathcomp-odd-order │  783.94   787.03  -0.39 │  3589983821586   3603469176234  -0.37 │  6010408676614   6010467094578  -0.00 │ 1613692  1613372   0.02 │
│                   coq-metacoq-pcuic │  983.20   986.98  -0.38 │  4404972986289   4422949011968  -0.41 │  6476124960985   6477689917928  -0.02 │ 2687444  2687500  -0.00 │
│         coq-rewriter-perf-SuperFast │  785.54   787.97  -0.31 │  3570507235681   3582046630445  -0.32 │  6197420420250   6197274917895   0.00 │ 1592120  1591444   0.04 │
│                        coq-rewriter │  389.20   390.31  -0.28 │  1769069661894   1775169648432  -0.34 │  2958784662727   2958830042935  -0.00 │ 1514348  1516980  -0.17 │
│        coq-fiat-crypto-with-bedrock │ 6197.92  6215.57  -0.28 │ 28228484891065  28312174313789  -0.30 │ 50355138221813  50350744383991   0.01 │ 3246088  3246076   0.00 │
│                            coq-corn │  724.55   726.38  -0.25 │  3291890306168   3300306490865  -0.26 │  5127631673616   5127578983339   0.00 │  760216   761268  -0.14 │
│                         coq-unimath │ 2431.62  2436.20  -0.19 │ 11073393399332  11092689178989  -0.17 │ 21928257910173  21924042156055   0.02 │ 1254648  1254340   0.02 │
│                            coq-hott │  157.74   158.02  -0.18 │   704147505006    707077714237  -0.41 │  1116522189145   1117606602086  -0.10 │  531828   546848  -2.75 │
│                        coq-compcert │  282.51   282.95  -0.16 │  1277480860583   1280984519484  -0.27 │  1945401955897   1945100011553   0.02 │ 1200900  1165196   3.06 │
│              coq-mathcomp-character │  103.62   103.71  -0.09 │   473315452291    473879780274  -0.12 │   745043049106    745064239392  -0.00 │ 1026448  1026060   0.04 │
│                 coq-metacoq-erasure │  502.14   502.54  -0.08 │  2290506190011   2292723752033  -0.10 │  3589468750161   3589270511276   0.01 │ 2142348  2142644  -0.01 │
│               coq-engine-bench-lite │  156.75   156.87  -0.08 │   663248405879    664587213924  -0.20 │  1228526043796   1227571386480   0.08 │ 1127280  1236824  -8.86 │
│                          coq-stdlib │  363.19   363.44  -0.07 │  1536961568442   1539580073221  -0.17 │  1307244682189   1307069491125   0.01 │  721220   712468   1.23 │
│                   coq-iris-examples │  469.20   469.47  -0.06 │  2133418268363   2134494353759  -0.05 │  3279185635234   3277087000725   0.06 │ 1120360  1115436   0.44 │
│                       coq-fourcolor │ 1347.58  1347.97  -0.03 │  6161686343577   6164774904159  -0.05 │ 12152235432431  12152011267787   0.00 │ 2129612  2133224  -0.17 │
│                    coq-math-classes │   85.87    85.87   0.00 │   386200279842    385878331860   0.08 │   536696938969    536659061849   0.01 │  506504   505836   0.13 │
│               coq-mathcomp-solvable │  118.30   118.24   0.05 │   539906262571    539777368602   0.02 │   857731989828    857417415748   0.04 │  861400   859260   0.25 │
│              coq-mathcomp-ssreflect │   92.93    92.85   0.09 │   424334963639    423282828564   0.25 │   664879638126    664892130173  -0.00 │ 1379004  1379016  -0.00 │
│                    coq-fiat-parsers │  312.37   312.01   0.12 │  1394420914661   1392731330397   0.12 │  2436162946847   2436169833530  -0.00 │ 2379344  2402000  -0.94 │
│                                 coq │  724.70   723.57   0.16 │  3049930627670   3050195971873  -0.01 │  5448552069862   5443553617724   0.09 │ 2466984  2478496  -0.46 │
│                           coq-color │  254.04   253.36   0.27 │  1142550227617   1140072756083   0.22 │  1642653294072   1642593046170   0.00 │ 1203588  1205472  -0.16 │
│                      coq-verdi-raft │  584.50   582.31   0.38 │  2657472199038   2648967951836   0.32 │  4172383679699   4172782133791  -0.01 │  846328   843652   0.32 │
│                         coq-coqutil │   42.94    42.75   0.44 │   189938552530    189017689788   0.49 │   272353759553    272247180024   0.04 │  563952   564188  -0.04 │
│                            coq-core │  128.31   127.67   0.50 │   499833132420    501568716575  -0.35 │   530601738567    530520661848   0.02 │  457136   456140   0.22 │
│                         coq-bignums │   29.98    29.74   0.81 │   135812414266    134863436260   0.70 │   193512166855    193345601504   0.09 │  478824   478928  -0.02 │
└─────────────────────────────────────┴─────────────────────────┴───────────────────────────────────────┴───────────────────────────────────────┴─────────────────────────┘

🐢 Top 25 slow downs
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                              TOP 25 SLOW DOWNS                                                               │
│                                                                                                                                              │
│   OLD       NEW      DIFF   %DIFF    Ln                     FILE                                                                             │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│  62.3430   63.3860  1.0430   1.67%   609  coq-fiat-crypto-with-bedrock/rupicola/bedrock2/bedrock2/src/bedrock2Examples/lightbulb.v.html      │
│  26.2760   27.2180  0.9420   3.59%    35  coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/MontgomeryLadderRISCV.v.html               │
│  28.0630   28.9560  0.8930   3.18%    32  coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/MontgomeryLadderRISCV.v.html               │
│  14.2890   14.8380  0.5490   3.84%  1505  coq-vst/floyd/VSU.v.html                                                                           │
│   1.6240    2.1560  0.5320  32.76%    20  coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/NBE.v.html                                        │
│   2.0480    2.5570  0.5090  24.85%    32  coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/NBE.v.html                                        │
│  17.4980   17.9430  0.4450   2.54%    32  coq-performance-tests-lite/src/pattern.v.html                                                      │
│   7.1910    7.5880  0.3970   5.52%  1503  coq-vst/floyd/VSU.v.html                                                                           │
│   1.0400    1.4110  0.3710  35.67%   854  coq-stdlib/FSets/FMapAVL.v.html                                                                    │
│ 155.3850  155.7300  0.3450   0.22%  1190  coq-unimath/UniMath/CategoryTheory/GrothendieckConstruction/IsPullback.v.html                      │
│   3.1450    3.4420  0.2970   9.44%   122  coq-stdlib/setoid_ring/Ncring_initial.v.html                                                       │
│   7.4060    7.6760  0.2700   3.65%  1501  coq-vst/floyd/VSU.v.html                                                                           │
│  72.7830   73.0510  0.2680   0.37%   905  coq-unimath/UniMath/ModelCategories/Generated/LNWFSCocomplete.v.html                               │
│  80.7710   81.0350  0.2640   0.33%    48  coq-fiat-crypto-with-bedrock/src/Curves/Weierstrass/AffineProofs.v.html                            │
│   0.8270    1.0780  0.2510  30.35%   384  coq-stdlib/MSets/MSetAVL.v.html                                                                    │
│   3.4070    3.6570  0.2500   7.34%  1060  coq-unimath/UniMath/CategoryTheory/EnrichedCats/Examples/FunctorCategory.v.html                    │
│  23.9500   24.1950  0.2450   1.02%    85  coq-fiat-crypto-with-bedrock/src/Curves/Montgomery/AffineProofs.v.html                             │
│  24.5230   24.7540  0.2310   0.94%    12  coq-fourcolor/theories/job517to530.v.html                                                          │
│   5.5890    5.8150  0.2260   4.04%   308  coq-iris-examples/theories/logrel/F_mu_ref_conc/binary/examples/stack/refinement.v.html            │
│  17.5130   17.7240  0.2110   1.20%   481  coq-verdi-raft/theories/RaftProofs/EndToEndLinearizability.v.html                                  │
│   0.8290    1.0330  0.2040  24.61%   816  coq-stdlib/MSets/MSetRBT.v.html                                                                    │
│  25.5670   25.7680  0.2010   0.79%    12  coq-fourcolor/theories/job190to206.v.html                                                          │
│   0.4820    0.6770  0.1950  40.46%    82  coq-stdlib/Numbers/Cyclic/Int63/Sint63.v.html                                                      │
│  23.3020   23.4960  0.1940   0.83%    12  coq-fourcolor/theories/job486to489.v.html                                                          │
│   4.4290    4.6200  0.1910   4.31%     5  coq-fiat-crypto-with-bedrock/src/Assembly/Parse/Examples/fiat_p256_square_optimised_seed103.v.html │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
🐇 Top 25 speed ups
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                             TOP 25 SPEED UPS                                                              │
│                                                                                                                                           │
│   OLD       NEW      DIFF     %DIFF    Ln                    FILE                                                                         │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 254.5780  248.8550  -5.7230   -2.25%     8  coq-neural-net-interp-computed-lite/theories/MaxOfTwoNumbersSimpler/Computed/AllLogits.v.html │
│  48.8640   46.2710  -2.5930   -5.31%   110  coq-fiat-crypto-with-bedrock/rupicola/bedrock2/bedrock2/src/bedrock2Examples/full_mul.v.html  │
│ 218.9550  216.6910  -2.2640   -1.03%   103  coq-fiat-crypto-with-bedrock/src/Arithmetic/BarrettReduction.v.html                           │
│  96.3470   94.5660  -1.7810   -1.85%   999  coq-performance-tests-lite/src/fiat_crypto_via_setoid_rewrite_standalone.v.html               │
│  96.2340   94.5380  -1.6960   -1.76%   968  coq-performance-tests-lite/src/fiat_crypto_via_setoid_rewrite_standalone.v.html               │
│  66.1640   65.0260  -1.1380   -1.72%    27  coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/ToFancyWithCasts.v.html                      │
│ 132.9220  131.8880  -1.0340   -0.78%    22  coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/ArithWithCasts.v.html                        │
│  44.7080   43.9180  -0.7900   -1.77%   827  coq-vst/veric/binop_lemmas4.v.html                                                            │
│   2.3200    1.5980  -0.7220  -31.12%   196  coq-stdlib/setoid_ring/Ncring_tac.v.html                                                      │
│ 256.2570  255.5380  -0.7190   -0.28%  1629  coq-metacoq-pcuic/pcuic/theories/PCUICSR.v.html                                               │
│   4.0550    3.4060  -0.6490  -16.00%   490  coq-stdlib/Reals/Cauchy/ConstructiveCauchyRealsMult.v.html                                    │
│  10.7310   10.1450  -0.5860   -5.46%   307  coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/EdwardsXYZT.v.html                    │
│   8.4700    8.0120  -0.4580   -5.41%   359  coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/EdwardsXYZT.v.html                    │
│  17.5190   17.0770  -0.4420   -2.52%   607  coq-mathcomp-odd-order/theories/PFsection9.v.html                                             │
│  28.8410   28.4100  -0.4310   -1.49%   144  coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/GarageDoorTop.v.html                  │
│  13.1890   12.7710  -0.4180   -3.17%  1028  coq-unimath/UniMath/CategoryTheory/LocalizingClass.v.html                                     │
│  29.3120   28.9020  -0.4100   -1.40%    12  coq-fourcolor/theories/job001to106.v.html                                                     │
│  21.8020   21.4400  -0.3620   -1.66%    40  coq-fiat-crypto-with-bedrock/src/PushButtonSynthesis/SolinasReductionReificationCache.v.html  │
│   5.0080    4.6470  -0.3610   -7.21%   111  coq-bedrock2/bedrock2/src/bedrock2Examples/full_mul.v.html                                    │
│  18.9430   18.5860  -0.3570   -1.88%   957  coq-unimath/UniMath/CategoryTheory/Monoidal/Examples/DisplayedCartesianMonoidal.v.html        │
│  87.4760   87.1350  -0.3410   -0.39%   365  coq-mathcomp-odd-order/theories/PFsection4.v.html                                             │
│  99.7400   99.4130  -0.3270   -0.33%    20  coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/NBE.v.html                                   │
│  23.3530   23.0350  -0.3180   -1.36%   296  coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/EdwardsXYZT.v.html                    │
│  39.8580   39.5490  -0.3090   -0.78%   236  coq-rewriter/src/Rewriter/Rewriter/Examples/PerfTesting/LiftLetsMap.v.html                    │
│  47.6170   47.3140  -0.3030   -0.64%   558  coq-bedrock2/bedrock2/src/bedrock2Examples/insertionsort.v.html                               │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

@@ -513,8 +580,7 @@ let to_memory fv code =
reloc_info = RelocTable.create 91;
} in
emit env code [];
(** Later uses of this string are all purely functional *)
let code = Bytes.sub_string env.out_buffer 0 env.out_position in
let code = compress_code env.out_buffer env.out_position in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is the wrong place to perform the compression, because when evaluating non-constants in the VM it will be immediately decompressed right after. This was one of the reasons why I introduced a VM library data structure on disk, the compression should occur around when building it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. That said, I would not be surprised if non-constants were to be rather short, bytecode-wise. (They might be huge, term-wise, especially when doing computational reflection, but most of that size will come from block values whose associated bytecode is trivial.) So, the cost of compressing/decompressing might be in the noise and not worth complicating the code. For example, heavy vm_compute users like Four-Color did not suffer from any slowdown.

@silene
Copy link
Contributor Author

silene commented Apr 20, 2024

So, on the whole bench, the overall reduction of the .vo size is about 20%, with files from HoTT being reduced by more than 70% (?!).

@SkySkimmer
Copy link
Contributor

I think HoTT has a higher than usual proportion of Defined vs Qed. I would expect similar results on unimath.

@silene
Copy link
Contributor Author

silene commented Apr 20, 2024

Sure. But think about it. By design, this patch cannot perform a better compression than 75% of the bytecode segment of a .vo file (and in practice, it is more like 65-70%). So, if there is a 73% reduction on the whole .vo file (e.g., Colimit_Flattening.vo), it means that more than 98% of the file is currently occupied by the bytecode segment. And since the last few bytes are presumably occupied by the relocation segment for the bytecode, it means that such a 8.3 MB file is basically information-free, hence my astonishment.

@silene
Copy link
Contributor Author

silene commented Apr 20, 2024

As an illustration, here is what the factorial function of the standard library looks like before and after this patch.

Before, 240 bytes:

2c 00 00 00  01 00 00 00  00 00 00 00  00 00 00 00 
2c 00 00 00  02 00 00 00  88 00 00 00  2a 00 00 00 
00 00 00 00  28 00 00 00  00 00 00 00  3b 00 00 00 
01 00 00 02  15 00 00 00  03 00 00 00  0a 00 00 00 
4e 00 00 00  15 00 00 00  f9 ff ff ff  00 00 00 00 
01 00 00 00  27 00 00 00  01 00 00 00  3d 00 00 00 
09 00 00 00  31 00 00 00  1f 00 00 00  0a 00 00 00 
37 00 00 00  01 00 00 00  35 00 00 00  01 00 00 00 
25 00 00 00  04 00 00 00  34 00 00 00  02 00 00 00 
27 00 00 00  01 00 00 00  2b 00 00 00  00 00 00 00 
04 00 00 00  12 00 00 00  01 00 00 00  88 00 00 00 
34 00 00 00  03 00 00 00  27 00 00 00  01 00 00 00 
2b 00 00 00  00 00 00 00  06 00 00 00  35 00 00 00 
03 00 00 00  38 00 00 00  00 00 00 00  88 00 00 00 
34 00 00 00  03 00 00 00  27 00 00 00  01 00 00 00 

After, 63 bytes:

2c 01 00 00  2c 02 88 2a  00 28 00 3b  fd 02 01 15 
03 0a 4e 15  fc f9 00 01  27 01 3d 09  31 1f 0a 37 
01 35 01 25  04 34 02 27  01 2b 00 04  12 01 88 34 
03 27 01 2b  00 06 35 03  38 00 88 34  03 27 01 

@ppedrot
Copy link
Member

ppedrot commented Apr 20, 2024

it means that more than 98% of the file is currently occupied by the bytecode segment

This is in line with some profiling I had done before, so I'm not really surprised.

@silene
Copy link
Contributor Author

silene commented Apr 21, 2024

@coqbot run full ci

@coqbot-app coqbot-app bot removed the needs: full CI The latest GitLab pipeline that ran was a light CI. Say "@coqbot run full ci" to get a full CI. label Apr 21, 2024
@silene
Copy link
Contributor Author

silene commented Apr 21, 2024

The thing is, the size of the bytecode is directly proportional to the size of the term, since there is no optimization such as inlining. So, to get 2 million opcodes from a minuscule term, the only explanation I can imagine is the existence of an exponential amount of sharing in that term, which I did not think could happen in a non-artificial use case.

@ppedrot
Copy link
Member

ppedrot commented Apr 21, 2024

the size of the bytecode is directly proportional to the size of the term

This is not quite true, the current compilation scheme of closures is quadratic in the number of binders. I think this is a very bad design choice for symbolic code such as the kind encountered in UniMath proofs, and more generally all the Coq definitions never meant to be sent to the VM.

@silene
Copy link
Contributor Author

silene commented Apr 22, 2024

Right, this had eluded me. Just to be sure, I suppose you are talking about a term of the form fun H1 H2 ... Hn H => H1 (fun x1 => H2 (fun x2 => ... Hn (fun xn => H x1 x2 ... xn))).

That said, from a computational point of view, I don't think there is a way around it. Indeed, it is important that a closure does not spuriously keep terms alive. So, it should not store more values in its closure than strictly needed for its evaluation. For example, the closure fun xn => ... should keep alive none of H1 to Hn.

I suppose we could add an opcode that copies a segment of the closure environment on the stack (i.e., a PUSHENVACC on steroids). This would remove the quadratic behavior, assuming that all the closures store their free variables in the same order. (No idea if this assumption holds. The bytecode compiler might need some tweaks.)

@ppedrot
Copy link
Member

ppedrot commented Apr 22, 2024

it is important that a closure does not spuriously keep terms alive

While I agree with this claim in your run-of-the-mill call-by-value boolean evaluator, it's not clear to me at all that for symbolic code with many bound variables that occur with probability 1 on average within subclosures (e.g. in type annotations) the same trade-off is worth it. I think we should experiment.

@JasonGross
Copy link
Member

How much trouble would it be to support both sorts of closure compilation schemes? It would be nice to be able to manually annotate some definitions as "symbolic-like closures".

@silene
Copy link
Contributor Author

silene commented Apr 23, 2024

Since #18964 did not reduce the size of the generated bytecode of coq-hott, I went and actually read it. Closures have nothing to do with the size of bytecode. The blowup is almost entirely caused by universe computations. Consider for example transport_E'_V:

fun (i j : G) (g : G i j) (x : D i) (y : E (i; x)) => moveR_transport_V E' (colimp i j g x) y (E_f g x y) (transport_E' g x y)^

The bytecode is about 500 opcodes long (before peephole optimization), 75% of which are there just for creating universe instances. More precisely, 14 instances are created for this small term, of about 6 universes each, with an average cost of 4 opcodes per universe.

@silene
Copy link
Contributor Author

silene commented Apr 23, 2024

Actually, my mistake, these are the numbers before the end of the section. This example gets even worse once the section is closed, since there are now 15 instances, with up to 11 universes, for a total of 94 universes, and the average cost per universe grows to 4.5 opcodes. Most of these instances are identical, hence shared, in the Coq term, but they get duplicated once compiled to bytecode.

SkySkimmer added a commit to SkySkimmer/coq that referenced this pull request Apr 23, 2024
instead of implementing the substitution in bytecode unrolled over the
instance to substitute.

Since the instances are now handled through structured constants even
when variable they get deduplicated, reducing vo size significantly in
universe polymorphic code.

For instance HoTT total theories/ vo size 141MB -> 90MB
HoTT Colimit_Flattening.vo size 8.7MB -> 3.4MB

cf discussion around coq#18959 (comment)
SkySkimmer added a commit to SkySkimmer/coq that referenced this pull request Apr 23, 2024
instead of implementing the substitution in bytecode unrolled over the
instance to substitute.

Since the instances are now handled through structured constants even
when variable they get deduplicated, reducing vo size significantly in
universe polymorphic code.

For instance HoTT total theories/ vo size 141MB -> 90MB
HoTT Colimit_Flattening.vo size 8.7MB -> 3.4MB
list with all files: https://gist.github.com/SkySkimmer/5d9eda76b016404b82239bbe352a9c6d

cf discussion around coq#18959 (comment)
@SkySkimmer
Copy link
Contributor

cf #18968 to try to reduce univ instance vm code size

@ppedrot
Copy link
Member

ppedrot commented Apr 23, 2024

Good catch, I didn't think about the universe instances...

SkySkimmer added a commit to SkySkimmer/coq that referenced this pull request Apr 23, 2024
instead of implementing the substitution in bytecode unrolled over the
instance to substitute.

Since the instances are now handled through structured constants even
when variable they get deduplicated, reducing vo size significantly in
universe polymorphic code.

For instance HoTT total theories/ vo size 141MB -> 90MB
HoTT Colimit_Flattening.vo size 8.7MB -> 3.4MB
list with all files: https://gist.github.com/SkySkimmer/5d9eda76b016404b82239bbe352a9c6d

cf discussion around coq#18959 (comment)
@silene
Copy link
Contributor Author

silene commented May 3, 2024

I am unable to reproduce the marshal failure on output/Partac.v (https://gitlab.inria.fr/coq/coq/-/jobs/4266038) with 4.14.1+flambda. Could someone with CI rights relaunch the job? Just to know if there is some actual issue or if it was a cosmic ray.

@SkySkimmer
Copy link
Contributor

launched

@silene
Copy link
Contributor Author

silene commented May 3, 2024

So, cosmic ray it was.

@github-actions github-actions bot added the needs: rebase Should be rebased on the latest master to solve conflicts or have a newer CI run. label May 7, 2024
@SkySkimmer SkySkimmer requested a review from a team May 7, 2024 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: performance Improvements to performance and efficiency. needs: rebase Should be rebased on the latest master to solve conflicts or have a newer CI run. part: VM Virtual machine.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants