From 8499010b91a2c7496d6af74cce35a6b4e0378633 Mon Sep 17 00:00:00 2001 From: Marvin Borner Date: Sat, 20 May 2023 14:13:45 +0200 Subject: Added testing flag still not able to find the 8cc bug but idc because I tested it with many programs and it probably won't be an issue for now --- readme.md | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) (limited to 'readme.md') diff --git a/readme.md b/readme.md index d42b2ab..80da044 100644 --- a/readme.md +++ b/readme.md @@ -71,7 +71,7 @@ A possible encoding in BLoC: | 0x04 | 0x06 | number of entries: 3 | | 0x06 | 0x17 | encoded `M`: gets a bit longer due to different encoding | | 0x17 | 0x28 | encoded `N`: gets a bit longer due to different encoding | -| 0x28 | 0x34 | `0001001001001100011011011`, where `=00{1}` and `=00{1}` are indices with length of 1.25 byte | +| 0x28 | 0x34 | `0001001001001100011011011`, where `=00{1}` and `=00{0}` are indices with length of 1.25 byte | Even in this small example BLoC uses less space than BLC (0x34 vs. 0x42 bytes). Depending on the content of `M` and `N`, this could have @@ -95,12 +95,18 @@ shortest one (as fully reduced expressions aren’t necessarily shorter). ## Improvements -The current optimizer does not always make the best deduplication +There seem to be problems with *very* big files: +[8cc](https://github.com/woodrush/lambda-8cc) does not pass the bloc-blc +comparison test. I’ve not been able to reproduce this bug with any other +file and 8cc itself is too huge to comfortably debug the issue. If +you’re reading this: Please help me :( + +Also the current optimizer does not always make the best deduplication choices. It seems like finding the optimal deduplications requires quite complex algorithms which would probably be rather inefficient. For example, as of right now the length of an expression as seen by the -deduplicator doesn’t consider the change of length when sub-expressions -get replaced by a reference (of unknown bit length!) to another -expression. This results in entries like `(0 <1234>)` that would not -have needed to be deduplicated. +deduplicator doesn’t consider the change of occurrence count when +sub-expressions get replaced by a reference to another expression. This +results in entries like `(0 <1234>)` that would not have needed to be +deduplicated. -- cgit v1.2.3