class override (same full class name)

following up with https://lwpro2.dev/2021/12/28/maven-nested-modules/ and https://lwpro2.dev/2021/12/14/microservies-in-monorepo/, there are times where a class defined upstream (either third party or shared library internally) should be overriden.

The trick is to keep the class name exact with the same package name. when maven shaded the class, it would pick up the class from the closest level.

For example, if module ACore has a class names org.wordpress.util.DomainMapper, a class with the exact same name org.wordpress.util.DomainMapper could be created in A1 module to override the provided functionality.

when build module A with -amd, it would generated a A2 jar, which contains the compiled class binary from ACore. while the generated A1 jar, would contain the override class created in A1 module.

maven nested modules

following up with https://lwpro2.dev/2021/12/14/microservies-in-monorepo/, there are times it needs a set up of nested modules.

where for example, module A could contain module A1, module A2 for example.

the needed changes/set up for nested maven module is similar to the multi module set up. however, now module A needs to be with

<packaging>pom</packaging>

now.

similarly, the submodule should be included in module A.

<parent>
    <groupId>org.xxx</groupId>
    <artifactId>ParentModule</artifactId>
    <version>1.0-SNAPSHOT</version>
</parent>
<artifactId>ModuelA</artifactId>
<packaging>pom</packaging>

<modules>
    <module>ACore</module>
    <module>A1</module>
    <module>A2</module>
</modules>

among the submodules, if A1 needs to use the code from ACore, the dependency scope should be compile, so that it would build A1 shaded with the classes from ACore.

<artifactId>A1</artifactId>

<properties>
    <maven.compiler.source>8</maven.compiler.source>
    <maven.compiler.target>8</maven.compiler.target>
</properties>

<dependencies>
    <dependency>
        <groupId>org.xxx</groupId>
        <artifactId>ACore</artifactId>
        <version>1.0-SNAPSHOT</version>
        <scope>compile</scope>
    </dependency>
</dependencies>

this will work out within IDE. as for build or packaging, the command to build the nested jars are

mvn clean package -pl ModuleA-am -amd

-am will include upstream dependency for example module Core

where -amd will include the nested modules, for example ACore, A1 and A2.

microservies in monorepo

recently i have spent some time to break a monolith project into multiple modules within a single monorepo.

it was a huge codebase with several different projects commingled together into a same git repo. after the change, it now break into different modules, literally several microservies in a monorepo.

there are several benefits with this change. the key one is, instead of sharing all code together in each project, it will now build its own module separately with only needed code of concern, and only needed depencencies, (hence with a faster build time, smaller package, and separate of concerns). At the same time, the common/core code can still be shared/maintained across, vs the breaking updates/compatibility issue with normally package/jar sharing.

Here is how the project looks like before and after

before: monolith project

after: modules

One of the key change is with the maven multi modules.

where the parent pom specify the modules to include (for both compile and runtime packaging), with itself being a pom packaging.

Then for each microservice/module, refer back to its parent module, and include the shared/core module if needed.

Parent:

microservice/module:

then for IDE and CI/CD, build from parent module would have sub modules executed (packaged for example, if run mvn package)

in order to build a single module alone, this can be instead achieved by running `mvn $goal (package) -pl moduleA -am`. this could be triggered either locally, for example through a filewatcher for hot reload, or CI/CD for specific branches or MR.

gc on old gen

i have a large app which is currently running with 600GB max memory (Xmx). the app is processing at a controlled rate now (every half an hour) to avoid an OOM.

each run is consuming and processing >3.2 million kafka messages in less than 5 minutes (1 or 2 minutes normally).

even after a lot tuning, when i look at the heap size, i saw the memory footprint is keep bumping up. even though the eden space has a lot frequent GC (minor) during the <5 minutes interval, the old gen is keep bumping up gradually.

whole heap

Eden

old gen

this really seems like a memory leak.

however, after another thorough check into the code, it looks like the collection of objects which are not used did get dereferenced.

so if the code is right, then it looks like the gc might not be doing its job on the old gen.

so i have then triggered a manual GC, which resulted

after spending sometime to look into this in details, turns out the default ratio to trigger the gc on old gen is >40%. and this is definitely the ideal set up, because for the application, for example, it’s sit idling there 25 minutes out of every 30 minutes interval. while because the gc is waiting on an fixed size ratio to trigger, the memory was mostly wasted.

around 120GB was wasted in this case before the manual gc.

so turns out this was a proposal by JEP 346 in 2018 to tune this.

http://openjdk.java.net/jeps/346

and obviously for now before that JEP implemented, leverage on the periodicGC is a much needed practice instead of leaving to the gc algorithm alone.

-XX:G1PeriodicGCInterval=600000 -XX:G1PeriodicGCSystemLoadThreshold=LOAD 
     -XX:-G1PeriodicGCInvokesConcurrent

full gc with default setting where it’s triggered at >40% threshold:

jvm memory tuning

a big memory drainer is the string objects.

with the object header, pointer for the char array, there are minimum ~20 bytes (varies by java version) occupied even for a empty string.

this could become an especially a big problem, if a large volume (like millions of records) of messages to parse onto single jvm.

from java 8, there are two ways to handle this, (especially for situations where a large amount of data all have for example same headers, like “portfolio”, “name”, “currency”. these are likely to have limited/constant number of variances for both the key/attribute/property and values)

  1. string intern, this is the approach before java 8

caveat though, the java default implementation with string intern could be slow with native implementation.

an alternative to the native implementation is using map, which would server the same purpose and faster. like

public final class StringRepo extends ConcurrentHashMap<String, String> {
    public final static StringRepo repo = new StringRepo();
    public String intern(String s){
      //Note: handle npe
       return computeIfAbsent(s, String::intern);
    }
}

2. from java 8, string deduplication could be used to take on gc’s help on reduce the string memory footprint.

-XX:+UseG1GC -XX:+UseStringDeduplication -XX:+PrintStringDeduplicationStatistics

with GC

https://github.com/FasterXML/jackson-core/issues/726

protobuf NPE

What language does this apply to?
Java

If it’s a proto syntax change, is it for proto2 or proto3?
proto3

If it’s about generated code change, what programming language?
Java

Describe the problem you are trying to solve.
For the message below,

message Position {
    string portfolio =1;
}

The generated setter would be something like this

      public Builder setPortfolio(
          java.lang.String value) {
        if (value == null) {
         throw new NullPointerException();
        }
  
        portfolio_ = value;
        onChanged();
        return this;
      }

There is a throw NPE within the method.

I think this is really an opinioned approach, which instead should leave to developers to decide whether to handle it or throw NPE.
There could be a position message, for example, with many known optional field which could be null. Developers should be in a better position on how those fields should be set.

Describe the solution you’d like

The generated class should take the value to be set as it is. Something like

      public Builder setPortfolio(
          java.lang.String value) {
//        if (value == null) {
//         throw new NullPointerException();
//        }
  
        portfolio_ = value;
        onChanged();
        return this;
      }

Describe alternatives you’ve considered

Additional context
Add any other context or screenshots about the feature request here.

I guess the current “opinioned” approach probably could be due to a constraint from the protobuf format, where an int was used to determine the length-delimited value’s length. If no, I think by introducing a negative int (-1) could tell whether the following value is really empty (0) or null (-1).

https://github.com/protocolbuffers/protobuf/issues/9207

thread safe sorted map

besides using Collections to get a synchronized version of a `TreeMap`, another approach to get a output according to sorted key is to do the reverse using a helper class.

for example, using ObjectMapper

//set the ordering
mapper.configure(SerializationFeature.ORDER_MAP_ENTRIES_BY_KEYS, true);

then to get the output or transform to another object, using the `mapper`

//output
mapper.writeValueAsString(pairs)

//transform
mapper.convertValue(pairs, Map.class)

intellij compiled class

when I finished updating a class in IntelliJ, I was expecting the hot reload (spring devtools) to kick in, however that didn’t happen automatically.

after a further look, seems like when a class is updated, IntelliJ has a different place (probably in memory) to store the updated class file or maybe only the delta.

the class file on the file system is not updated, still with the old timestamp. hence the filewatcher won’t kick the reload.

a rebuild of the project would then update any class updated. the timestamp would be updated, subsequently the reload.

spring kafka seek

I have a kafka consumer which after resume, it’s picking up from last committed offset and sequentially consuming from there. Even with several calls to seekToEnd().

Logs

//before pause
2021-10-09 15:44:18,569 [fx-kafka-consumer-0-C-1] INFO  c.x.f.r.k.HarvestConsumer - check raw msg size 1 on [17] @ [38012024115]   
2021-10-09 15:44:28,573 [fx-kafka-consumer-0-C-1] INFO  c.x.f.r.k.HarvestConsumer - check raw msg size 1 on [17] @ [38012024116]   

//paused

//called seekToEnd before resume

//after resume
//called seekToEnd several times before and after resume
2021-10-09 15:45:13,603 [kafka-consumer-0-C-1] INFO  c.x.f.r.k.HarvestConsumer - check raw msg size 1 on [17] @ [38012024117]   
2021-10-09 15:45:53,610 [kafka-consumer-0-C-1] INFO  c.x.f.r.k.HarvestConsumer - check raw msg size 1 on [17] @ [38012024118]  
2021-10-09 15:46:03,612 [kafka-consumer-0-C-1] INFO  c.x.f.r.k.HarvestConsumer - check raw msg size 1 on [17] @ [38012024119]   
2021-10-09 15:46:13,613 [kafka-consumer-0-C-1] INFO  c.x.f.r.k.HarvestConsumer - check raw msg size 1 on [17] @ [38012024120] 

Turns out this actually due to an issue with the implementation of the subclass extend AbstractConsumerSeekAware.

Turned out there is an issue with my current code. At consumer start up, I did a manual seek in the onPartitionsAssigned. This so far have been working well.

However, now when there is a need to pause/resume the consumer, turned out the overrode method to do the manual seek has since skipped maintaining the Consumer into the callbacks field.

When the seekToEnd was invoked, it’s really iterating through an empty map.

sql server parameter sniffing

i was using PreparedStatement to load some data out of sql database, which has been working fine for quite some time. Till recently, for one day’s data, the query is not able to finish (hang on reading the response from database)

      "stackTrace": [
        {
          "methodName": "socketRead0",
          "fileName": "SocketInputStream.java",
          "lineNumber": -2,
          "className": "java.net.SocketInputStream",
          "nativeMethod": true
        },
        {
          "methodName": "socketRead",
          "fileName": "SocketInputStream.java",
          "lineNumber": 116,
          "className": "java.net.SocketInputStream",
          "nativeMethod": false
        },
        {
          "methodName": "read",
          "fileName": "SocketInputStream.java",
          "lineNumber": 171,
          "className": "java.net.SocketInputStream",
          "nativeMethod": false
        },
        {
          "methodName": "read",
          "fileName": "SocketInputStream.java",
          "lineNumber": 141,
          "className": "java.net.SocketInputStream",
          "nativeMethod": false
        },
        {
          "methodName": "read",
          "fileName": "IOBuffer.java",
          "lineNumber": 2058,
          "className": "com.microsoft.sqlserver.jdbc.TDSChannel",
          "nativeMethod": false
        },
        {
          "methodName": "readPacket",
          "fileName": "IOBuffer.java",
          "lineNumber": 6617,
          "className": "com.microsoft.sqlserver.jdbc.TDSReader",
          "nativeMethod": false
        },
        {
          "methodName": "nextPacket",
          "fileName": "IOBuffer.java",
          "lineNumber": 6567,
          "className": "com.microsoft.sqlserver.jdbc.TDSReader",
          "nativeMethod": false
        },
        {
          "methodName": "ensurePayload",
          "fileName": "IOBuffer.java",
          "lineNumber": 6540,
          "className": "com.microsoft.sqlserver.jdbc.TDSReader",
          "nativeMethod": false
        },
        {
          "methodName": "skip",
          "fileName": "IOBuffer.java",
          "lineNumber": 7200,
          "className": "com.microsoft.sqlserver.jdbc.TDSReader",
          "nativeMethod": false
        },
        {
          "methodName": "skipValue",
          "fileName": "dtv.java",
          "lineNumber": 3362,
          "className": "com.microsoft.sqlserver.jdbc.ServerDTVImpl",
          "nativeMethod": false
        },
        {
          "methodName": "skipValue",
          "fileName": "dtv.java",
          "lineNumber": 162,
          "className": "com.microsoft.sqlserver.jdbc.DTV",
          "nativeMethod": false
        },
        {
          "methodName": "skipValue",
          "fileName": "Column.java",
          "lineNumber": 152,
          "className": "com.microsoft.sqlserver.jdbc.Column",
          "nativeMethod": false
        },
        {
          "methodName": "skipColumns",
          "fileName": "SQLServerResultSet.java",
          "lineNumber": 216,
          "className": "com.microsoft.sqlserver.jdbc.SQLServerResultSet",
          "nativeMethod": false
        },
        {
          "methodName": "loadColumn",
          "fileName": "SQLServerResultSet.java",
          "lineNumber": 770,
          "className": "com.microsoft.sqlserver.jdbc.SQLServerResultSet",
          "nativeMethod": false
        },
        {
          "methodName": "getterGetColumn",
          "fileName": "SQLServerResultSet.java",
          "lineNumber": 2036,
          "className": "com.microsoft.sqlserver.jdbc.SQLServerResultSet",
          "nativeMethod": false
        },
        {
          "methodName": "getValue",
          "fileName": "SQLServerResultSet.java",
          "lineNumber": 2054,
          "className": "com.microsoft.sqlserver.jdbc.SQLServerResultSet",
          "nativeMethod": false
        },
        {
          "methodName": "getValue",
          "fileName": "SQLServerResultSet.java",
          "lineNumber": 2040,
          "className": "com.microsoft.sqlserver.jdbc.SQLServerResultSet",
          "nativeMethod": false
        },
        {
          "methodName": "getString",
          "fileName": "SQLServerResultSet.java",
          "lineNumber": 2525,
          "className": "com.microsoft.sqlserver.jdbc.SQLServerResultSet",
          "nativeMethod": false
        },

the same query however works well with plain run in the sql tool, or if run using plain java statement.

and until I have added a dummy where clause “… and 1=1“, then suddenly the preparedstatement is able to return the result timely again.

in the beginning, i thought it was the dummy clause which made a difference strangely. turns out, this was a problem with sql server parameter sniffing.

the dummy where clause worked only because sql server now see it as a different query, hence not using the previous cached execution plan.

this can be reproduced by adding `

option (recompile)

to the query. this will trigger sql server to drop the previous cached execution plan, as such, even the original query, without the dummy where clause is now back to performing again.

SELECT * FROM sys.database_scoped_configurations;

https://www.databasejournal.com/features/mssql/turning-off-parameter-sniffing-for-a-sql-server-database-by-default.html

==========================

looking further into this, the execution was stuck at using previous executing plan.

seems like because the first plan, in blue, is not working out. finally it created a new plan in

green.

and the wait for the blue plan was for HTDELETE. this is a change made in SQL server since 2014.

SQL Server 2014 now uses one shared hash table instead of per-thread copy.
This provides the benefit of significantly lowering the amount of memory
required to persist the hash table but, as you can imagine, the multiple
threads depending on that single copy of the hash table must synchronize with
each other before, for example, deallocating the hash table. To do so, those
threads wait on the HTDELETE (Hash Table DELETE) wait type.<o:p></o:p>

https://social.msdn.microsoft.com/Forums/en-US/8579f864-cbdc-49b9-be26-d47af61df56d/sql-server-2014-wait-info-htdelete?forum=sql14db

likely this is a bug in SQL server, which was hit when I am running the program using parallelStream(), which steam() would work.*

*There are multiple process hitting the same query.

Provides the benefit of significantly lowering the amount of memory required to persist the hash table but, as you can imagine, the multiple threads depending on that single copy of the hash table must synchronize with each other before, for example, deallocating the hash table. To do so, those threads wait on the HTDELETE (Hash Table DELETE) wait type.