hpdcache: update the submodule #2940

cfuguet · 2025-04-18T14:20:02Z

Update the HPDcache submodule into CVA6

github-actions · 2025-04-18T14:29:22Z

❌ failed run, report available here.

github-actions · 2025-05-19T02:16:41Z

👋 Hi there!

This pull request seems inactive. Need more help or have updates? Feel free to let us know. If there are no updates within the next few days, we'll go ahead and close this PR. 😊

github-actions · 2025-05-26T14:30:55Z

❌ failed run, report available here.

github-actions · 2025-05-26T20:03:45Z

❌ failed run, report available here.

cfuguet · 2025-05-27T08:19:49Z

Hello @JeanRochCoulon,

Could you review and merge this PR ?

It updates the HPDcache submodule to its latest version, which contains some bugfixes and performance improvements.

Thank you !

Bill94l · 2025-06-11T14:14:05Z

Hi @cfuguet

I was looking at the following line :

cva6/core/cache_subsystem/cva6_hpdcache_subsystem.sv

Line 204 in a9190fd

    
           userCfg.dataWaysPerRamWord = __minu(CVA6Cfg.DCACHE_SET_ASSOC, 128 / CVA6Cfg.XLEN);

Do you think the mapping would be more optimal if this value were computed based on the cache line width instead 128?

userCfg.dataWaysPerRamWord = __minu(CVA6Cfg.DCACHE_SET_ASSOC, CVA6Cfg.DCACHE_LINE_WIDTH / CVA6Cfg.XLEN);

Thanks!

Bill94l · 2025-06-11T14:44:12Z

If we update the dataWaysPerRamWord parameter to be computed dynamically based on the cache line width, then the hpdcache_sram_wbyteenable_1rw module needs to be adapted to correctly support wider data paths (e.g., 128, 256, or 512 bits).

Here are the changes I made to support 128, 256, and 512-bit configurations using a generic multi-bank structure (i can open a pull request if you agree with this approach):

git diff common/local/util/hpdcache_sram_wbyteenable_1rw.sv
diff --git a/common/local/util/hpdcache_sram_wbyteenable_1rw.sv b/common/local/util/hpdcache_sram_wbyteenable_1rw.sv
index d9f29961..8103d8d5 100644
--- a/common/local/util/hpdcache_sram_wbyteenable_1rw.sv
+++ b/common/local/util/hpdcache_sram_wbyteenable_1rw.sv
@@ -25,58 +25,74 @@ module hpdcache_sram_wbyteenable_1rw
     output logic [DATA_SIZE-1:0]   rdata
 );
 
-if (DATA_SIZE == 128) begin
-    // Découpage des données en deux moitiés de 64 bits
-    logic [DATA_SIZE/2-1:0] wdata_low, wdata_high;
-    logic [DATA_SIZE/2-1:0] rdata_low, rdata_high;
-    logic [7:0] be_low, be_high;
-    assign wdata_low  = wdata[63:0];
-    assign wdata_high = wdata[127:64];
-    assign be_low  = wbyteenable[7:0];
-    assign be_high = wbyteenable[15:8];
-
-    SyncSpRamBeNx64 #(
-        .ADDR_WIDTH(ADDR_SIZE),
-        .DATA_DEPTH(DEPTH),
-        .OUT_REGS  (0),
-        .SIM_INIT  (1)
-    ) SyncSpRam_0 (
-        .Clk_CI   (clk),
-        .Rst_RBI  (rst_n),
-        .CSel_SI  (cs),
-        .WrEn_SI  (we),          // Ecriture sur la banque basse
-        .BEn_SI   (be_low),
-        .Addr_DI  (addr),
-        .WrData_DI(wdata_low),
-        .RdData_DO(rdata_low)
-    );
-
-    SyncSpRamBeNx64 #(
-        .ADDR_WIDTH(ADDR_SIZE),
-        .DATA_DEPTH(DEPTH),
-        .OUT_REGS  (0),
-        .SIM_INIT  (1)
-    ) SyncSpRam_1 (
-        .Clk_CI   (clk),
-        .Rst_RBI  (rst_n),
-        .CSel_SI  (cs),
-        .WrEn_SI  (we),          // Ecriture sur la banque haute
-        .BEn_SI   (be_high),
-        .Addr_DI  (addr),
-        .WrData_DI(wdata_high),
-        .RdData_DO(rdata_high)
-    );
-
-    assign rdata = {rdata_high, rdata_low};
+// ------------------------------------------------------------------------
+// Generic multi-bank instantiation for 128, 256, and 512 bits
+//   - Each bank is 64 bits wide, with an 8-bit byte-enable
+//   - Number of banks = DATA_SIZE / 64
+// ------------------------------------------------------------------------
+if (DATA_SIZE == 128 || DATA_SIZE == 256 || DATA_SIZE == 512) begin
+    // Calculate the number of 64-bit banks needed
+    localparam int unsigned NBANKS = DATA_SIZE / 64;
+    // --------------------------------------------------------------------
+    // Declare packed arrays to hold per-bank write data, byte enables, 
+    // and read data. Each packed array collapses into a single vector:
+    //   - wdata_banks is NBANKS × 64 bits
+    //   - be_banks    is NBANKS ×  8 bits
+    //   - rdata_banks is NBANKS × 64 bits
+    // --------------------------------------------------------------------
+    logic [NBANKS-1:0][63:0] wdata_banks;
+    logic [NBANKS-1:0][ 7:0] be_banks;
+    logic [NBANKS-1:0][63:0] rdata_banks;
+    // --------------------------------------------------------------------
+    // Implicitly split the wide wdata and wbyteenable buses into the packed
+    // multi-dimensional arrays. Because these arrays are "packed-packed",
+    // the total bit-width matches DATA_SIZE exactly:
+    //   - wdata_banks[i] receives wdata[i*64 +: 64]
+    //   - be_banks[i]    receives wbyteenable[i*8 +: 8]
+    // --------------------------------------------------------------------
+    assign wdata_banks = wdata;           // Implicit splitting of DATA_SIZE bits into NBANKS × 64
+    assign be_banks    = wbyteenable;     // Implicit splitting of DATA_SIZE/8 bits into NBANKS × 8
+    // --------------------------------------------------------------------
+    // Instantiate one SyncSpRamBeNx64 per 64-bit bank
+    //   - All banks share the same clk, rst_n, cs, we, and addr signals
+    //   - Each bank i uses wdata_banks[i] and be_banks[i]
+    //   - The read result is captured into rdata_banks[i]
+    // --------------------------------------------------------------------
+    for (genvar i = 0; i < NBANKS; i++) begin : gen_sram_banks
+        SyncSpRamBeNx64 #(
+            .ADDR_WIDTH(ADDR_SIZE),
+            .DATA_DEPTH(DEPTH),
+            .OUT_REGS  (0),
+            .SIM_INIT  (1)
+        ) SyncSpRam_i (
+            .Clk_CI    (clk),
+            .Rst_RBI   (rst_n),
+            .CSel_SI   (cs),
+            .WrEn_SI   (we),
+            .BEn_SI    (be_banks[i]),
+            .Addr_DI   (addr),
+            .WrData_DI (wdata_banks[i]),
+            .RdData_DO (rdata_banks[i])
+        );
+    end
+    // --------------------------------------------------------------------
+    // Implicitly recombine rdata_banks (packed) into the full DATA_SIZE bus
+    //   - Since rdata_banks is packed-packed NBANKS × 64 bits, it matches
+    //     exactly the width of rdata
+    // --------------------------------------------------------------------
+    assign rdata = rdata_banks;
 
+// ------------------------------------------------------------------------
+// Single 64-bit bank instantiation for 64-bit data size
+// ------------------------------------------------------------------------
 end else if (DATA_SIZE == 64) begin
     SyncSpRamBeNx64 #(
       .ADDR_WIDTH(ADDR_SIZE),
-      .DATA_DEPTH(DEPTH), // usually 2**ADDR_WIDTH, but can be lower
+      .DATA_DEPTH(DEPTH),   // usually 2**ADDR_WIDTH, but can be lower
       .OUT_REGS  (0),
-      .SIM_INIT  (1)     // for simulation only, will not be synthesized
-                                   // 0: no init, 1: zero init, 2: random init
-                                   // note: on verilator, 2 is not supported. define the VERILATOR macro to work around.
+      .SIM_INIT  (1)        // for simulation only, will not be synthesized
+                            // 0: no init, 1: zero init, 2: random init
+                            // note: on verilator, 2 is not supported. define the VERILATOR macro to work around.
     )SyncSpRam_i(
       .Clk_CI   (clk),
       .Rst_RBI  (rst_n),
@@ -87,14 +103,17 @@ end else if (DATA_SIZE == 64) begin
       .WrData_DI(wdata),
       .RdData_DO(rdata)
     );
+// ------------------------------------------------------------------------
+// Single 32-bit bank instantiation for 32-bit data size
+// ------------------------------------------------------------------------
 end else if (DATA_SIZE == 32) begin
     SyncSpRamBeNx32 #(
       .ADDR_WIDTH(ADDR_SIZE),
-      .DATA_DEPTH(DEPTH), // usually 2**ADDR_WIDTH, but can be lower
+      .DATA_DEPTH(DEPTH),   // usually 2**ADDR_WIDTH, but can be lower
       .OUT_REGS  (0),
-      .SIM_INIT  (1)     // for simulation only, will not be synthesized
-                                   // 0: no init, 1: zero init, 2: random init
-                                   // note: on verilator, 2 is not supported. define the VERILATOR macro to work around.
+      .SIM_INIT  (1)        // for simulation only, will not be synthesized
+                            // 0: no init, 1: zero init, 2: random init
+                            // note: on verilator, 2 is not supported. define the VERILATOR macro to work around.
     )SyncSpRam_i(
       .Clk_CI   (clk),
       .Rst_RBI  (rst_n),
@@ -105,9 +124,11 @@ end else if (DATA_SIZE == 32) begin
       .WrData_DI(wdata),
       .RdData_DO(rdata)
     );
-
+// ------------------------------------------------------------------------
+// Unsupported data width: produce a fatal elaboration error
+// ------------------------------------------------------------------------
 end else begin
-   $fatal(1, "DATASIZE=%d, in not supported " ,DATA_SIZE);
+   $fatal(1, "hpdcache_sram_wbyteenable_1rw: DATASIZE=%d, in not supported " ,DATA_SIZE);
 end

Thanks ^^

Billal IGHILAHRIZ

cfuguet · 2025-06-11T15:46:17Z

Hi @Bill94l,

Maybe you can open a separate issue to discuss about this. As it is not directly related to this PR.

However, a quick comment about your changes. I need to check when these files that you modified are used (common/local/util/hpdcache_sram_wbyteenable_1rw.sv)

Actually, I provide SRAM wrappers that do not have the problems you mentioned (hpdcache/rtl/src/common/macros/behav/hpdcache_sram_wbyteenable_1rw.sv).

Those behavioral models work well on FPGA targets. If you target ASIC, then you need to generate the appropriate macros in the target technology and make wrappers.

In summary, I do not see why or when the common/local/util/hpdcache_sram_wbyteenable_1rw.sv module needs to be used.

cfuguet · 2025-06-11T16:00:08Z

Hi @cfuguet

I was looking at the following line :

cva6/core/cache_subsystem/cva6_hpdcache_subsystem.sv

Line 204 in a9190fd

userCfg.dataWaysPerRamWord = __minu(CVA6Cfg.DCACHE_SET_ASSOC, 128 / CVA6Cfg.XLEN);

Do you think the mapping would be more optimal if this value were computed based on the cache line width instead 128?
userCfg.dataWaysPerRamWord = __minu(CVA6Cfg.DCACHE_SET_ASSOC, CVA6Cfg.DCACHE_LINE_WIDTH / CVA6Cfg.XLEN);
Thanks!

Regarding this, actually 128 is not related to the cacheline width. It is kind of a "magic number" related to ASIC SRAM macros. In most of the recent ASIC technologies we explored the SRAMs, a 128 bits width is the maximum at which the macros have the best performance (access latency, power and area). Beyond that, it is better to generate multiple macros and put them one after the other

Bill94l · 2025-06-11T16:16:53Z

@cfuguet Thank you for your quick responses!

On my side, I updated the HPDcache module based on the last commit 882aba1c0cc912bef82dd4e4fec4df4c67148920.
With this version, Linux boots successfully with cache line sizes of 128, 256, 512, and 1024 bits

cfuguet · 2025-06-11T17:54:08Z

Great ! Thank you @Bill94l for the update 😄

github-actions · 2025-06-13T12:43:56Z

❌ failed run, report available here.

github-actions · 2025-06-18T14:52:01Z

❌ failed run, report available here.

github-actions · 2025-06-26T06:33:08Z

❌ failed run, report available here.

github-actions · 2025-07-01T22:00:03Z

❌ failed run, report available here.

github-actions · 2025-07-02T04:19:56Z

❌ failed run, report available here.

JeanRochCoulon · 2025-07-02T04:56:04Z

one tet fails, here is a part of the log file:

cfuguet · 2025-07-02T13:25:12Z

Hi @JeanRochCoulon ,

Thank you for the information. It looks like there are some X's sent by the cache in the AXI WDATA channel. I'm looking if I can get access to a VCS license to run the test at my end.... Otherwise, If possible, could you send me a VCD file through a filesharing system ?

github-actions · 2025-07-03T12:50:57Z

❌ failed run, report available here.

JeanRochCoulon · 2025-07-03T13:15:01Z

@AyoubJalali do you the possibility to deliver such VCD file to @cfuguet ?

AyoubJalali · 2025-07-03T16:15:25Z

@AyoubJalali do you the possibility to deliver such VCD file to @cfuguet ?

@cfuguet do you have the name of the test you want its VCS file ?

cfuguet · 2025-07-03T20:45:02Z

Hello @AyoubJalali,

The failing tests are:

Thanks !

JeanRochCoulon · 2025-07-03T20:51:38Z

JeanRochCoulon · 2025-07-03T20:52:14Z

Ah ah, I answered 6 minutes after you @cfuguet !!!! You won

github-actions · 2025-07-03T20:58:20Z

core/cache_subsystem/cva6_hpdcache_subsystem_axi_arbiter.sv

-      .mem_req_read_o      (mem_req_read_arb)
+      .mem_req_read_o      (mem_req_read_arb),
+
+      .gnt_index_o         (/**/)


[verible-verilog-format] _{reported by reviewdog 🐶}

Suggested change

.gnt_index_o (/**/)

.gnt_index_o( /**/)

github-actions · 2025-07-03T21:04:39Z

❌ failed run, report available here.

github-actions · 2025-07-03T21:08:00Z

❌ failed run, report available here.

github-actions · 2025-07-03T21:13:28Z

❌ failed run, report available here.

github-actions · 2025-07-15T21:08:39Z

❌ failed run, report available here.

github-actions · 2025-08-15T02:17:25Z

👋 Hi there!

This pull request seems inactive. Need more help or have updates? Feel free to let us know. If there are no updates within the next few days, we'll go ahead and close this PR. 😊

github-actions bot added the Status:Stale Issue or PR is stale and hasn't received any updates. label May 19, 2025

JeanRochCoulon removed the Status:Stale Issue or PR is stale and hasn't received any updates. label May 19, 2025

cfuguet force-pushed the dev/updt_hpdcache branch from b8a0f96 to bb40929 Compare May 26, 2025 14:21

cfuguet force-pushed the dev/updt_hpdcache branch from bb40929 to 000c336 Compare May 26, 2025 19:53

cfuguet marked this pull request as ready for review May 26, 2025 19:55

cfuguet requested a review from JeanRochCoulon May 27, 2025 08:20

cfuguet added the Component:RTL For issues in the RTL (e.g. for files in the rtl directory) label May 27, 2025

cfuguet force-pushed the dev/updt_hpdcache branch 2 times, most recently from cad00d6 to 9950fbc Compare June 6, 2025 07:59

cfuguet mentioned this pull request Jun 13, 2025

build(deps): bump core/cache_subsystem/hpdcache from 04de808 to 0afd6dc #3001

Closed

cfuguet mentioned this pull request Jun 24, 2025

Bump core/cache_subsystem/hpdcache from 04de808 to dc006a1 #3024

Closed

cfuguet force-pushed the dev/updt_hpdcache branch 2 times, most recently from 42b3d40 to badf6fb Compare July 3, 2025 20:57

github-actions bot reviewed Jul 3, 2025

View reviewed changes

hpdcache: update submodule

7834fc8

cfuguet force-pushed the dev/updt_hpdcache branch from badf6fb to 7834fc8 Compare July 3, 2025 21:04

Merge branch 'master' into dev/updt_hpdcache

59a377a

github-actions bot added the Status:Stale Issue or PR is stale and hasn't received any updates. label Aug 15, 2025

hpdcache: update the submodule #2940

Are you sure you want to change the base?

hpdcache: update the submodule #2940

Conversation

cfuguet commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 18, 2025

Uh oh!

github-actions bot commented May 19, 2025

Uh oh!

github-actions bot commented May 26, 2025

Uh oh!

github-actions bot commented May 26, 2025

Uh oh!

cfuguet commented May 27, 2025

Uh oh!

Bill94l commented Jun 11, 2025

Uh oh!

Bill94l commented Jun 11, 2025

Uh oh!

cfuguet commented Jun 11, 2025

Uh oh!

cfuguet commented Jun 11, 2025

Uh oh!

Bill94l commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cfuguet commented Jun 11, 2025

Uh oh!

github-actions bot commented Jun 13, 2025

Uh oh!

github-actions bot commented Jun 18, 2025

Uh oh!

github-actions bot commented Jun 26, 2025

Uh oh!

github-actions bot commented Jul 1, 2025

Uh oh!

github-actions bot commented Jul 2, 2025

Uh oh!

JeanRochCoulon commented Jul 2, 2025

Uh oh!

cfuguet commented Jul 2, 2025

Uh oh!

github-actions bot commented Jul 3, 2025

Uh oh!

JeanRochCoulon commented Jul 3, 2025

Uh oh!

AyoubJalali commented Jul 3, 2025

Uh oh!

cfuguet commented Jul 3, 2025

Uh oh!

JeanRochCoulon commented Jul 3, 2025

Uh oh!

JeanRochCoulon commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 3, 2025

Uh oh!

github-actions bot commented Jul 3, 2025

Uh oh!

github-actions bot commented Jul 3, 2025

Uh oh!

github-actions bot commented Jul 15, 2025

Uh oh!

github-actions bot commented Aug 15, 2025

Uh oh!

Uh oh!

cfuguet commented Apr 18, 2025 •

edited

Loading

Bill94l commented Jun 11, 2025 •

edited

Loading

JeanRochCoulon commented Jul 3, 2025 •

edited

Loading