Work-in-Progress on PL/Java refactoring, API modernization #399

jcflack · 2022-01-24T02:20:51Z

As a work-in-progress pull request, this is not expected to be imminently merged, but is here to document the objectives and progress of the ongoing work.

Why needed

A great advantage promised by a PL based on the JVM is the large ecosystem of languages other than Java that can be supported on the same infrastructure, whether through the Java Scripting (JSR 223) API, or through the polyglot facilities of GraalVM, or simply via separate compilation to the class file format and loading as jars.

However, PL/Java, with its origins in 2004 predating most of those developments, has architectural limitations that stand in the way.

JDBC

One of the limitations is the centrality of the JDBC API. To be sure, it is a standard in the Java world for access to a database, and for PL/Java to conform to ISO SQL/JRT, the JDBC API must be available. But it is not necessarily a preferred or natural database API for other JVM or GraalVM languages, and its design goal is to abstract away from the specifics of an underlying database, which ends up complicating or even preventing access to advanced PostgreSQL capabilities that could be prime drivers for running server-side code in the first place.

The problem is not that JDBC is an available API in PL/Java, but that it is the fundamental API in PL/Java, with its tentacles reaching right into the native C language portion of PL/Java's implementation. That has made alternative interface options impractical, and multiplied the maintenance burden of even simple tasks like adding support for new datatype mappings or fixing simple bugs. There are significant portions of JDBC 4 that remain unimplemented in PL/Java.

Experience building an implementation of ISO SQL/XML XMLQUERY showed that certain requirements of the spec were simply unsatisfiable atop JDBC, either because of inherent JDBC limitations or limits in PL/Java's implementation of it. An example of each kind:

The INTERVAL data type cannot be mapped as SQL/XML requires, because the only ResultSetMetadata methods JDBC defines for access to a type modifier are precision and scale, which apply to numeric values; the API defines no standard way to learn what the modifier of an INTERVAL says about whether months or days are present.
The DECIMAL type cannot be mapped as SQL/XML requires; for that case, the fault is not with JDBC (which defines the precision and scale methods), but with their incomplete implementation in PL/Java.

Those cases also illustrate that mapping some PostgreSQL data types to those of another language can be complex. An arbitrary PostgreSQL INTERVAL is representable as neither a java.time.Period nor a java.time.Duration alone (though a pair of the two can be used, a type that PGJDBC-NG offers). One or the other can suffice if the type modifier is known and limits the fields present. A PostgreSQL NUMERIC value has not-a-number and signed infinity values that some candidate language-library type might not, and an internal precision that its text representation does not reveal, which might need to be preserved for a mathematically demanding task. The details of converting it to another language's similar type need to be knowable or controllable by an application.

It is a goal of this work to give PL/Java an API that does not obscure or abstract from PostgreSQL details, but makes them accessible in a natural Java idiom, and that such a "natural PostgreSQL" API should be adequate to allow building a JDBC layer in pure Java above it. (The work of building such a JDBC layer is not in the scope of this pull request.)

Parameter and return-value mapping

PL/Java uses a simple, Java-centric approach where a Java method is declared naturally, giving ordinary Java types for its parameters and return, and the mappings from these to the PostgreSQL parameter and return types are chosen by PL/Java and applied transparently (and much of that happens deep in PL/Java's C code).

While convenient, that approach isn't easily adapted to other JVM languages that may offer other selections of types. Even for Java, it stands in the way of doing certain things possible in PostgreSQL, like declaring VARIADIC "any" functions.

In a modernized API, it needs to be possible to declare a function whose parameter represents the PostgreSQL FunctionCallInfo, so that the parameters and their types can be examined and converted in Java. That will make it possible to write language handlers in Java, whether for other JVM languages or for the existing PL/Java calling conventions that at present are tangled in C.

Elements of new API

Identification of data types

A PostgreSQL-specific API must be able to refer unambiguously to any type known to the database, so it cannot rely on any fixed set of generic types such as JDBCType. To interoperate with a JDBC layer, though, the identifier for types should implement JDBC's SQLType interface.

The API should support retrieving enough metadata about the type for a JDBC layer implemented above it to be able to report complete ResultSetMetaData information.

The new class serving this purpose is RegType.

As RegType implements the java.sql.SQLType interface, an aliasing issue arises for a JDBC layer. Such a layer should accept JDBCType.VARCHAR as an alias for RegType.VARCHAR, for example. JDBC itself has no methods that return an SQLType instance, so the question of whether it should return the generic JDBC type or the true RegType does not arise. A PL/Java-specific API is needed for retrieving the type identifier in any case.

The details of which JDBC types are considered aliases of which RegTypes will naturally belong in a JDBC API layer. At the level of this underlying API, a RegType is what identifies a PostgreSQL type.

While RegType includes convenience final fields for a number of common types, those by no means limit the RegTypes available. There is a RegType that can be obtained for every type known to the database, whether built in, extension-supplied, or user-defined.

Mapping PostgreSQL data types to what a PL supports

The `Adapter` class

A mapping between a PostgreSQL data type and a suitable PL data type is an instance of the Adapter class, and more specifically of the reference-returning Adapter.As<T,U> or one of the primitive-returning Adapter.AsInt<U>, Adapter.AsFloat<U>, and so on (one for each Java primitive type). The Java type produced is T for the As case, and implicit in the class name for the AsFoo cases.

The basic method for fetching a value from a TupleTableSlot is get(Attribute att, Adapter adp), and naturally is overloaded and generic so that get with an As<T,?> adapter returns a T, get with an AsInt<?> adapter returns an int, and so on. (But see this later comment below for a better API than this item-at-a-time stuff.) (The U type parameter of an adapter plays a role when adapters are combined by composition, as discussed below, and is otherwise usually uninteresting to client code, which may wildcard it, as seen above.)

A manager class for adapters

Natural use of this idiom presumes there will be some adapter-manager API that allows client code to request an adapter for some PostgreSQL type by specifying a Java witness class Class<T> or some form of super type token, and returns the adapter with the expected compile-time parameterized type.

That manager hasn't been built yet, but the requirements are straightforward and no thorny bits are foreseen. (Within the org.postgresql.pljava.internal module itself, things are simpler; no manager is needed, and code refers directly to static final INSTANCE fields of existing adapters.)

Extensibility

PL/Java has historically supported user-defined types implemented in Java, a special class of data types whose Java representations must implement a certain JDBC interface and import and export values through a matching JDBC API. In contrast, PL/Java's first-class PostgreSQL data type support—the mappings it supplies between PostgreSQL and ordinary Java types that don't involve the specialized JDBC user-defined type APIs—has been hardcoded in C using Java Native Interface (JNI) calls, and not straightforward to extend. That's a pain point for several situations:

A mapping for another PostgreSQL data type (either a type newly added to PostgreSQL, or simply one that PL/Java does not yet have a mapping for) is not easily added for an application that needs it, but generally must be added in PL/Java's C/JNI internals and made available in a new PL/Java build.
A mapping of an existing PostgreSQL data type to a new or different Java type—same story. When Java 8 introduced the java.time package, developers wishing to have PL/Java map PostgreSQL's date and time types to the improved Java types instead of the older java.sql ones had to open issues requesting that ability and wait for a PL/Java release to include it.
Not every PostgreSQL data type has a single best PL type to be mapped to. One application using the geometric types might want them mapped to the Java types in the PGJDBC library, while another might prefer the 2D classes supplied by some Java geometry library. One application might want a PostgreSQL array mapped to a flat Java List, another to a multi-dimensioned Java array, another to a matrix class from a scientific computation library. The choices multiply when considering the data types not only of Java but of other JVM languages. C coding and rebuilding of PL/Java should not be needed to tailor these mappings.

Adapters implementable in pure Java

With this PR, code external to PL/Java's implementation can supply adapters, built against the service-provider API exposed in org.postgresql.pljava.adt.spi.

Leaf adapters

A "leaf" adapter is one that directly knows the PostgreSQL datum format of its data type, and maps that to a suitable PL type. Only a leaf adapter gets access to PostgreSQL datums, which it should not leak to other code. Code that defines leaf adapters must be granted a permission in pljava.policy.

Composing adapters

A composing, or non-leaf, adapter is one meant to be composed over another adapter. An example would be an adapter that composes over an adapter returning type T (possibly null) to form an adapter returning Optional<T>. With a selection of common composing adapters (there aren't any in this pull request, yet), it isn't necessary to provide leaf adapters covering all the ways application code might want data to be presented. No special permission is needed to create a composing adapter.

Java's generic types are erased to raw types for runtime, but the Java compiler saves the parameter information for runtime access through Java reflection. As adapters are composed, the Adapter class tracks the type relationships so that, for example, an Adapter<Optional<T>,T> composed over an Adapter<String,Void> is known to produce Optional<String>.

It is that information that will allow an adapter manager to satisfy a request to map a given PostgreSQL type to some PL type, by finding and composing available adapters.

Contract-based adapters

For a PostgreSQL data type that doesn't have one obvious best mapping to a PL type (perhaps because there are multiple choices with different advantages, or because there is no suitable type in the PL's base library, and any application will want the type mapped to something in a chosen third-party library), a contract-based adapter may be best. An Adapter.Contract is a functional interface with parameters that define the semantically-important components of the PostgreSQL type, and a generic return type, so an implementation can return any desired representation for the type.

A contract-based adapter is a leaf adapter class with a constructor that accepts a Contract, producing an adapter between the PostgreSQL type and whatever PL type the contract maps it to. The adapter encapsulates the internal details of how a PostgreSQL datum encodes the value, and the contract exposes the semantic details needed to faithfully map the type. Contracts for many existing PostgreSQL types are provided in the org.postgresql.pljava.adt package.

ArrayAdapter

The one supplied ArrayAdapter is contract-based. While a Contract.Array has a single abstract method, and therefore could serve as a functional interface, in practice it is not directly implementable by a lambda; there must be a subclass or subinterface (possibly anonymous) whose type parameterization the Java compiler can record. (A lambda may then be used to instantiate that.) An instance of ArrayAdapter is constructed by supplying an adapter for the array's element type along with an array contract targeting some kind of collection of the mapped type. As with a composing adapter, the Adapter class substitutes the element adapter's target Java type through the type parameters of the array contract, to arrive at the actual parameterized type of the resulting array or collection.

PostgreSQL arrays can be multidimensional, and are regular (not "jagged"; all sub-arrays at a given dimension match in size). They can have null elements, which are tracked in a bitmap, offering a simple way to save some space for arrays that are sparse; there are no other, more specialized sparse-array provisions.

Array indices need not be 0- or 1-based; the base index as well as the index range can be given independently for each dimension. PostgreSQL creates 1-based arrays by default. This information is stored with the array value, not with the array type, so a column declared with an array type could conceivably have values of different cardinalities or even dimensionalities.

The adapter is contract-based because there are many ways application code could want a PostgreSQL array to be presented: as a List or single Java array (flattening multiple dimensions, if present, to one, and disregarding the base index), as a Java array-of-arrays, as a JDBC Array object (which does not officially contemplate more than one array dimension, but PostgreSQL's JDBC drivers have used it to represent multidimensioned arrays), as the matrix type offered by some scientific computation library, and so on.

For now, one predefined contract is supplied, AsFlatList, and a static method, nullsIncludedCopy, that can be used (via method reference) as one implementation of that contract.

Java array-of-arrays

While perhaps not an extremely efficient way to represent multidimensional arrays, the Java array-of-arrays approach is familiar, and benefits from a bit of dedicated support for it in Adapter. Therefore, if you have an Adapter a that renders a PostgreSQL type Foo as Java type Bar, you can use, for example, a.a2().build() to obtain an Adapter from the PostgreSQL array type Foo[] to the Java type Bar[][], requiring the PostgreSQL array to have two dimensions, allowing each value to have different sizes along those dimensions, but disregarding the PostgreSQL array's start indices (all Java arrays start at 0).

Because PostgreSQL stores the dimension information with each value and does not enforce it for a column as a whole, it could be possible for a column of array values to include values with other numbers of dimensions, which an adapter constructed this way will reject. On the other hand, the sizes along each dimension are also allowed by PostgreSQL to vary from one value to the next, and this adapter accommodates that, as long as the number of dimensions doesn't change.

The existing contract-based ArrayAdapter is used behind the scenes, but build() takes care of generating the contract. Examples are provided.

Adapter maintainability

Providing pure-Java adapters that know the internal layouts of PostgreSQL data types, without relying on JNI calls and the PostgreSQL native support routines, entails a parallel-implementation maintenance responsibility roughly comparable to that of PostgreSQL client drivers that support binary send and receive. (The risk is slightly higher because the backend internal layouts are less committed than the send/receive representations. Because they are used for data on disk, though, historically they have not changed often or capriciously.)

The engineering judgment is that the resulting burden will be manageable, and the benefits in clarity and maintainability of the pure-Java implementations, compared to the brittle legacy Java+C+JNI approach, will predominate. The process of developing clear contracts for PostgreSQL types already has led to discovery of one bug (#390) that could be fixed in the legacy conversions.

For the adapters supplied in the org.postgresql.pljava.internal module, it is possible to use ModelConstants.java/ModelConstants.c to ensure that key constants (offsets, flags, etc.) stay synchronized with their counterparts in the PostgreSQL C code.

Adapter is a class in the API module, with the express intent that other adapters can be developed, and found by the adapter manager through a ServiceLoader API, without being internal to PL/Java. Those might not have the same opportunity for build-time checking against PostgreSQL header files, and will have to rely more heavily on regression tests for key data values, much as binary-supporting client drivers must. The same can be true even for PL/Java internal adapters for a few PostgreSQL data types whose C implementations are so strongly encapsulated (numeric comes to mind) that necessary layouts and constants do not appear in .h files.

Known open items

In no well-defined order ....

And then

Choose some interesting JVM language foo and implement a simple PL/foo in pure Java, using these facilities.
Reimplement PL/Java's own language handler the same way.

Tweak invocation.c so the stack-allocated space provided by the caller is used to save the prior state rather than to construct the new state. This way, the current state can have a fixed address (currentInvocation is a constant pointer) and can be covered by a single static ByteBuffer that Invocation.java can read/write through without relying on JNI methods. As Invocation isn't a JDBC-specific concept or class, it has never made much sense to have it in the .jdbc package. Move it to .internal.

Both values have just been stashed by stashCallContext. Both will be restored 14 lines later by _closeIteration. And nothing in those 14 lines cares about them.

After surveying the code for where function return values can be constructed, add one switchToUpperContext() around the construction of non-composite SRF return values, where it was missing, so such values can be returned correctly after SPI_finish(), and so the former, very hacky, cross-invocation retention of SPI contexts can be sent to pasture. For the record, these are the notes from that survey of the code: Function results, non-set-returning: Type_invoke: the inherited _Type_invoke calls ->coerceObject, within sTUC. sub"class"es that override it: Boolean,Byte,Double,Float,Integer,Long,Short,Void: - overridden in order to use appropriately-typed JNI invoke method - Double,Float,Long have _asDatum that does sTUC; . historical artifact; those types were !byval before PG 8.4 - the rest do not sTUC; should be ok, all byval Coerce: does sTUC Composite: does sTUC around _getTupleAndClear Arrays: createArrayType (extern, in Array.c) does sTUC. So far so good. What about !byval elements stored into the array? the non-primitive/any types don't override _Array_coerceObject, which is where Type_coerceObject on each element, and construct_md_array are called. With no sTUC. Around construct_md_array is really where it's needed. But then, _Array_coerceObject is still being called within sTUC of _Type_invoke. All good. Hmm: !byval elements of values[] are leaked when pfree(values) happens. They should be pfree'd unconditionally; construct_md_array copies them. What about UDTs? They don't override _Type_invoke. So they inherit the one that calls ->coerceObject, within sTUC. That ought to be enough. UDT.c's coerceScalarObject itself also sTUCs, inconsistently, for fixed-length and varlena types but not NUL-terminated. That should be ok, and merely redundant. In coerceTupleObject, no sTUC appears. Again, by inheritance of coerceObject, that should be ok. Absent that, sTUC around the SQLOutputToTuple_getTuple should be adequate; only if that could produce a tuple with TOAST pointers would it also be necessary around the HeapTupleGetDatum. Function results, set-returning: _datumFromSRF is applied to each row result The inherited _datumFromSRF calls Type_coerceObject, NOT within sTUC XXX this, at least, definitely needs a sTUC added. sub"class"es that override it: only Composite: calls _getTupleAndClear, NOT within sTUC. But it works out, just because TupleDesc.java's native _formTuple method uses JavaMemoryContext. Spooky action at a distance? Results from triggers: Function.c's invokeTrigger does sTUC around the getTriggerReturnTuple.

In passing, fix a long-standing thinko in Invocation_popInvocation: the memory context that was current on entry is stored in upperContext of *this* Invocation, but popInvocation was 'restoring' the one that was saved in the *previous* Invocation. Also in passing, move the cleanEnqueuedInstances step later in the pop sequence, improving its chance of seeing instances that could become unreachable through the release of SPI contexts or the JNI local frame.

This can reveal issues with the nesting of SPI 'connections' or management of their associated memory contexts.

Without the special treatment, the instance of the Java class Invocation, if any, that corresponds to the C Invocation, has its lifetime simply bounded to that of the C Invocation, rather than artificially extended across a sequence of SRF value-per-call invocations. It is simpler, does not break any existing tests, and is less likely to be violating PostgreSQL assumptions on correct behavior.

The commits merged here into this branch simplify PL/Java's management of the PostgreSQL-to-PL/Java-function invocation stack, and especially simplify the handling of SPI (PostgreSQL's Server Programming Interface) and set-returning functions. SPI includes "connect" and "finish" operations normally used in a simple pattern: connect before using SPI functions, finish when done and before returning to the caller, and if anything allocated while "connected" is to be returned to the caller, be sure to allocate that in the "upper executor" memory context (that is, the context that was current before SPI_connect). PL/Java has long diverged from that approach, especially for the case of set-returning functions using the value-per-call protocol (the only one PL/Java currently supports). If SPI was connected during one call in the sequence, PL/Java has sought to save and reuse that connection and its memory contexts over later calls (where a simpler, "by the book" implementation would simply SPI_connect and SPI_finish within the individual calls as needed). It never seemed altogether clear that was a good idea, but at the same time there weren't field reports of failure. It turns out, though, not hard to construct tests showing the apparent success was all luck. It has not been much trouble to reorganize that code so that SPI is used in the much simpler, by-the-book fashion. b2094ba fixes one place where a needed switchToUpperContext was missing but the error was masked by the former SPI juggling, and with that fixed, all the tests in the CI script promptly passed, with SPI used in the purely nested way that it expects. One other piece of complexity that has been removed was the handling of Java Invocation objects during set-returning functions. Although the stack-allocated C invocation struct naturally lasts only through one actual call, PL/Java's SRF code took pains to keep its Java counterpart alive, as if the one instance represented the entire sequence of actual calls while returning a set. Eliminating that behavior has simplified the code and shown no adverse effect in the available tests. As these are changes of some significance that might possibly alter some behavior not tested here, they have not been made in the 1.6 or 1.5 branches. But the simplification seems to make a less brittle base for the development going forward on this branch.

CacheMap is a generic class useful for (possibly weak or soft) canonicalizing caches of things that are identified by one or more primitive values. (Writing the key values into a ByteBuffer avoids the allocation involved in boxing them; however, the API as it currently stands might be exceeding that cost with instantiation of lambdas. It should eventually be profiled, and possibly revised into a less tidy, but more efficient, form.) SwitchPointCache is intended for lazily caching numerous values of diverse types, groups of which can be associated with a single SwitchPoint for purposes of invalidation. As currently structured, the SwitchPoints (and their dependent GuardWithTest nodes) do not get stored in static final fields; this may limit HotSpot's ability to optimize them as fully as it could if they did.

Adapter is the abstract ancestor of all classes that implement PostgreSQL datatypes for PL/Java, and the adt.spi package contains classes that will be of use to datatype-implementing code: in particular, Datum. PostgreSQL datums are only exposed to Adapters, and the Adapter's job is to reliably convert between the PostgreSQL type and some appropriate Java representation. For some datatypes, there is a single or obvious appropriate Java representation, and an Adapter may be provided that simply produces that. For other datatypes, there may be no single obvious choice of Java representation, either because there is no good match or because there are several; an application might want to map types to specialized classes available in some domain-specific library. To serve those cases, Adapters can be defined in terms of Adapter.Contract subinterfaces, which are simply functional interfaces that document and expose the semantic components of the PostgreSQL type. For example, a contract for PostgreSQL INTERVAL would expose a 64-bit microseconds component, a 32-bit day count, and a 32-bit month count. The division of responsibility is that the Adapter encapsulates how to extract those components given a PostgreSQL datum, but the contract fixes the semantics of what the components are. It is then simple to use the Adapter, with any lambda that conforms to the contract, to produce any desired Java representation of the type. Dummy versions of Attribute, RegClass, RegType, TupleDescriptor, and TupleTableSlot break ground here on the model package, which will consist of a set of classes modeling key PostgreSQL abstractions and a useful subset of the PostgreSQL system catalogs. RegType also implements java.sql.SQLType, making it usable in (a suitable implementation of) JDBC to specify PostgreSQL types precisely. adt.spi.AbstractType needs the specialization() method that was earlier added to internal.Function in anticipation of needing it someday.

The org.postgresql.pljava.adt package contains 'contracts' (subinterfaces of Adapter.Contract.Scalar or Adapter.Contract.Array), which are functional interfaces that document and expose the exact semantic components of PostgreSQL data types. Adapters are responsible for the internal details of PostgreSQL's representation that aren't semantically important, and code that simply needs to construct some semantically faithful representation of the type only needs to be concerned with the contract.

CharsetEncoding is not really a catalog object (the available encodings in PostgreSQL are hardcoded) but is exposed here as a similar kind of object with useful operations, including encoding and decoding using the corresponding Java codec when known. CatalogObject is, of course, the superinterface of all things that really are catalog objects (identified by a classId, an objectId, and rarely a subId). This commit brings in RegNamespace and RegRole as needed for CatalogObject.Namespaced and CatalogObject.Owned. RolePrincipal is a bridge between a RegRole and Java's Principal interface. CatalogObject.Factory is a service interface 'used' by the API module, and will be 'provided' by the internals module to supply the implementations of these things.

And convert other code to use CharsetEncoding.SERVER_ENCODING where earlier hacks were used, like the implServerCharset() added to Session in 1.5.1. In passing, fix a bit of overlooked java7ification in SQLXMLImpl. The new CharsetEncodings example provides two functions: SELECT * FROM javatest.charsets(); returns a table of the available PostgreSQL encodings, and what Java encodings they could be matched up with. SELECT * FROM javatest.java_charsets(try_aliases); returns the table of all available Java charsets and the PostgreSQL ones they could be matched up with, where the boolean try_aliases indicates whether to try Java's known aliases for a charset when nothing in PostgreSQL matched its canonical name. False matches happen when try_aliases is true, so that's not a great idea.

These PostgreSQL notions will have to be available to Java code for two reasons. First, even code that has no business poking at them can still need to know which one is current, to set an appropriate lifetime on a Java object that corresponds to something in PostgreSQL allocated in that context or registered to that owner. For that purpose, they both will be exposed as subtypes of Lifespan, and the existing PL/Java DualState class will be reworked to accept any Lifespan to bound the validity of the native state. Second, Adapter code could very well need to poke at such objects (MemoryContexts, anyway): either to make a selected one current for when allocating some object, or even to create and manage one. Methods for that will not be exposed on MemoryContext or ResourceOwner proper, but could be protected methods of Adapter, so that only an Adapter can use them.

In addition to MemoryContextImpl and ResourceOwnerImpl proper, this step will require reworking DualState so state lives are bounded by Lifespan instances instead of arbitrary pointer values. Invocation will be made into yet another subtype of Lifespan, appropriate for the life of an object passed by PostgreSQL in a call and presumed good while the call is in progress. The DualState change will have to be rototilled through all of its clients. That will take the next several commits. The DualState.Key requirement that was introduced in 1.5.1 as a way to force DualState-guarded objects to be constructed only in upcalls from C (as a hedge against Java code inadvertently doing it on the wrong thread) will go away. We *want* Adapters to be able to easily construct things without leaving Java. Just don't do it on the wrong thread.

Never very well publicized upstream, reading the examples of plpgsql, plperl, and plpython, when using BeginInternalSubTransaction, there is a certain pattern of saving and restoring the memory context and resource owner that PL/Java has not been doing. Now it is easy to implement that. https://www.postgresql.org/message-id/619EA06D.9070806%40anastigmatix.net

The current invocation can be the right Lifespan to specify for a DualState that's guarding some object PostgreSQL passed in to the call, which is expected to be good for as long as the call is in progress. In other, but related, news, Invocation can now return the "upper executor" memory context: that is, whatever context was current at entry, even if a later use of SPI changes the context that is current. It can appear tempting to eliminate the special treatment of PgSavepoint in Invocation, and simply make it another DualState client, but because of the strict nesting imposed on savepoints, keeping just the one reference to the first one set suffices, and is more efficient.

Simplify these: their C callers were passing unconditional null as the ResourceOwner before, which their Java constructors passed along unchanged. Now just have the Java constructor pass null as the Lifespan.

These DualState clients were previously passing the address of the current invocation struct as their "resource owner", again from the C code, passed along by the Java constructor. Again simplify to call Invocation.current() right in the Java constructor and use that as the Lifespan. On a side note, the legacy Relation class included here (and its legacy Tuple and TupleDesc) will naturally be among the first candidates for retirement when this new model API is ready.

This legacy Portal class is called from C and passed the address of the PostgreSQL ResourceOwner associated with the Portal itself.

This is only an intermediate refactoring of VarlenaWrapper. Construction of one is still set in motion from C. Ultimately, it should implement Datum and be something that a Datum.Accessor can construct with a minimum of fuss.

Originally a hedge against coding mistakes during the introduction of DualState for 1.5.1 (which had to support Java < 9), it is less necessary now that the internals are behind JPMS encapsulation, and the former checks for the cookie can be replaced with assertions that the action is happening on the right thread. The CI tests run with assertions enabled, so this should be adequate.

The commits grouped under this merge add API to expose in Java the PostgreSQL notions of MemoryContext and ResourceOwner, and then rework PL/Java's DualState class (which manages objects that combine some Java state and some native state, and may need specified actions to occur if the Java state becomes unreachable or explicitly released or if a lifespan bounding the native state expires). A DualState now accepts a Lifespan, of which MemoryContext and ResourceOwner are both subtypes. So is Invocation, an obvious lifespan for things PostgreSQL passes in that are expected to be valid for the duration of the call. The remaining commits in this group propagate the changes through the affected legacy code.

Fitting it into the new scheme is not entirely completed here; for example, newReadable takes a Datum.Input parameter, but still casts it internally to VarlenaWrapper.Input. Making it interoperate with any Datum.Input may be a bit more work. Likewise, newReadable with synthetic=true still encapsulates all the knowledge of what datatypes there is synthetic-XML coverage for and selecting the right VarlenaXMLRenderer for it (there's that varlena-specificity again!). More of that should be moved out of here and into an Adapter. In passing, fix a couple typos in toString() methods, and add a serviceable, if brute-force, getString() method to Synthetic. It would be better for SyntheticXMLReader to gain the ability to produce character-stream output efficiently, but until that happens, there needs to be something for those moments when you just want a string to look at and shouldn't have to fuss to get it. For now, VarlenaWrapper.Input and .Stream still extend, and add small features like toString(Object) to, DatumImpl. Later work can probably migrate those bits so VarlenaWrapper will only contain logic specific to varlenas. An adt.spi interface Verifier is added, though Datum doesn't yet expose any way to use it; in this commit, only one method accepting Verifier.OfStream is added in DatumImpl.Input.Stream, the minimal change needed to get things working.

As before, JNI methods for this 'model' framework continue to be grouped together in ModelUtils.c; their total number and complexity is expected to be low enough for that to be practical, and then they can all be seen in one place. RegClassImpl and RegTypeImpl acquire m_tupDescHolder arrays in this commit, without much explanation; that will come a few commits later.

There are two flavors so far, Deformed and Heap. Deformed works with whatever a real PostgreSQL TupleTableSlot can work with, relying on the PostgreSQL implementation to 'deform' it into separate datum and isnull arrays. (That doesn't have to be a PostgreSQL 'virtual' TupleTableSlot; it can do the deforming independently of the type of slot. When the time comes to implement the reverse direction and produce tuples, a virtual slot will be the way to go for that, using the PostgreSQL C code to 'form' it once populated.) The Heap flavor knows enough about that PostgreSQL tuple format to 'deform' it in Java without the JNI calls (except where some out-of-line value has to be mapped, or for varlena values until VarlenaWrapper sheds more of its remaining JNI-centricity). The Heap implementation does not yet do anything clever to memoize the offsets into the tuple, which makes the retrieval of all the tuple's values an O(n^2) proposition; there is a low-hanging-fruit optimization opportunity there. For now, it gets the job done. It might be interesting to see how the two flavors compare on typical heap tuples: Deformed, making more JNI calls but relying on PostgreSQL's fast native deforming, or Heap, which can avoid more JNI calls, and also avoids deforming something into a fresh native memory allocation if the only thing it will be used for is to immediately construct some Java object. The Heap flavor can do one thing the Deformed flavor definitely cannot: it can operate on heap-tuple-formatted contents of an arbitrary Java byte buffer, which in theory might not even be backed by native memory. (Again, for now, this is slightly science fiction where varlena values are concerned, because VarlenaWrapper retains a lot of its native dependencies. A ByteBuffer "heap tuple" with varlenas in it will have to be native-backed for now.) The selection of the DualState guard by heapTupleGetLightSlot() is currently more hardcoded than that would suggest; it assumes the buffer is mapping memory that can be heap_free_tuple'd. The 'light' in heapTupleGetLightSlot really means that there isn't an underlying PostgreSQL TupleTableSlot constructed. The whole business of how to apply and use DualState guards on these things still needs more attention. There is also Heap.Indexed, which is the thing needed for arrays. When the element type is fixed-length, it achieves O(1) access (plus null-bitmap processing if there are nulls). It uses a "count preceding null bits ahead of time" strategy that could also easily be adopted in Heap. A NullableDatum flavor is also needed, which would be the thing for mapping (as one prominent example) function-call arguments. The HeapTuples8 and HeapTuples4 classes at the end are scaffolding and ought to be factored out into something with a decent API, as hinted at in the comment preceding them. A Heap instance still inherits the values/nulls array fields used in the deformed case, without (at present) making any use of them. It is possible some use could be made (as, again, an underlying PG TupleTableSlot could be used in deforming a heap tuple), but it's also possible that won't ever be needed, and the class could be refactored to a simpler form.

Here's how this is going to work. The "exists because mentioned" aspect of a CatalogObject is a lightweight operation, just caching/returning a singleton with the mentioned values of classId/objId/(subId?). For a bare CatalogObject (objId unaccompanied by classId), that's all there is. But for any CatalogObject.Addressed subtype, the classId and objId together identify a tuple in a particular system catalog (or, that is, identify a tuple that could exist in that catalog). And the methods on the Java class that return information about the object get the information by fetching attributes from that tuple, then constructing whatever the Java representation will be. Not to duplicate the work of fetching (the tuple itself, and then an attribute from the tuple) and constructing the Java result, an instance will have an array of SwitchPointCache-managed "slots" that will cache, lazily, the constructed results. Five of those slots have their indices standardized right here in CatalogObjectImpl, to account for the name, namespace, owner, and ACL of objects that have those things. Slot 0 is for the tuple itself. When an uncached value is requested, the "computation method" set up for that slot will execute (always on the PG thread, so it can interact with PostgreSQL with no extra ceremony). Most computation methods will begin by calling cacheTuple() to obtain the tuple itself from slot 0, and then will fetch the wanted attribute from it and construct the result. The computation method for cacheTuple(), in turn, will obtain the tuple if that hasn't happened yet, usually from the PostgreSQL syscache. We copy it to a long-lived memory context where we can keep it until its invalidation. The most common way the cacheTuple is fetched is by a one-argument syscache search by the object's Oid. When that is all that is needed, the Java class need only implement cacheId() to return the number of the PostgreSQL syscache to search in. For exceptional cases (attributes, for example, require a two-argument syscache search), a class should just provide its own cacheTuple computation method. The slots for an object are associated with a Java SwitchPoint, and the mapping from the object to its associated SwitchPoint is a function supplied to the SwitchPointCache.Builder. Some classes, such as RegClass and RegType, will allocate a SwitchPoint per object, and can be selectively invalidated. Otherwise, by default, the s_globalPoint declared here can be used, which will invalidate all values of all slots depending on it.

They are the two CatalogObjects with tupleDescriptor() methods. You can get strictly more tuple descriptors by asking RegType; a RegType.Blessed can give you a tuple descriptor that has been interned in the PostgreSQL typcache and corresponds to nothing in the system catalogs. But whenever a RegType t is an ordinary cataloged composite type or the row type of a cataloged relation, then there is a RegClass c such that c == t.relation() and t == c.type(), and you will get the same tuple descriptor from the tupleDescriptor() method of either c or t. In all but one such case, c delegates to c.type().tupleDescriptor() and lets the RegType do the work, obtaining the descriptor from the PG typcache. The one exception is when the tuple descriptor for pg_class itself is wanted, in which case the RegClass does the work, obtaining the descriptor from the PG relcache, and RegType delegates to it for that one exceptional case. The reason is that RegClass will see the first request for the pg_class tuple descriptor, and before that is available, c.type() can't be evaluated. In either case, whichever class looked it up, a cataloged tuple descriptor is always stored on the RegClass instance, and RegClass will be responsible for its invalidation if the relation is altered. (A RegType.Blessed has its own field for its tuple descriptor, because there is no corresponding RegClass for one of those.) Because of this close connection between RegClass and RegType, the methods RegClass.type() and RegType.relation() use a handshake protocol to ensure that, whenever either method is called, not only does it cache the result, but its counterpart for that result instance caches the reverse result, so the connection can later be traversed in either direction with no need for a lookup by oid. In the static initializer pattern introduced here, the handful of SwitchPointCache slots that are predefined in CatalogObject.Addressed are added to, by starting an int index at Addressed.NSLOTS, incrementing it to initialize additional slot index constants, then using its final value to define a new NSLOTS that shadows the original.

An Attribute is most often obtained from a TupleDescriptor (in this API, that's how it's done), and the TupleDescriptor can supply a version of Attribute's tuple directly; no need to look it up anywhere else. That copy, however, cuts off at ATTRIBUTE_FIXED_PART_SIZE bytes. The most commonly needed attributes of Attribute are found there, but for others beyond that cutoff, the full tuple has to be fetched from the syscache. So AttributeImpl has the normal SLOT_TUPLE slot, used for the rarely-needed full tuple, and also its own SLOT_PARTIALTUPLE, for the truncated version obtained from the containing tuple descriptor. Most computation methods will fetch from the partial one, with the full one referred to only by the ones that need it. It doesn't end there. A few critical Attribute properties, byValue, alignment, length, and type/typmod, are needed to successfully fetch values from a TupleTableSlotImpl.Heap. So Attribute cannot use that API to fetch those values. For those, it must hardcode their actual offsets and sizes in the raw ByteBuffer that the containing tuple descriptor supplies, and fetch them directly. So there is also a SLOT_RAWBUFFER. This may sound more costly in space than it is. The raw buffer, of course, is just a ByteBuffer sliced off and sharing the larger one in the TupleDescriptor, and the partial tuple is just a TupleTableSlot instance built over that. The full tuple is another complete copy, but only fetched when those less-commonly-needed attributes are requested. With those key values obtained from the raw buffer, the Attribute's name does not require any such contortions, and can be fetched using the civilized TupleTableSlot API, except it can't be done by name, so the attribute number is used for that one. An AttributeImpl.Transient holds a direct reference to the TupleDescriptor it came from, which its containingTupleDescriptor() method returns. An AttributeImpl.Cataloged does not, and instead holds a reference to the RegClass for which it is defined in the system catalogs, and containingTupleDescriptor() delegates to tupleDescriptor() on that. If the relation has been altered, that could return an updated new tuple descriptor.

RegClass is an easy choice, because those invalidations are also the invalidations of TupleDescriptors, and because it has a nice API; we are passed the oid of the relation to invalidate, so we acquire the target in O(1). (Note in passing: AttributeImpl is built on SwitchPointCache in the pattern that's emerged for CatalogObjects in general, and an AttributeImpl.Cataloged uses the SwitchPoint of the RegClass, so it's clear that all the attributes of the associated tuple descriptor will do the right thing upon invalidation. In contrast, TupleDescriptorImpl itself isn't quite built that way, and the question of just how a TupleDescriptor itself should act after invalidation hasn't been fully nailed down yet.) RegType is probably also worth invalidating selectively, as is probably RegProcedure (procedures are mainly what we're about in PL/Java. right?), though only RegType is done here. That API is less convenient; we are passed not the oid but a hash of the oid, and not the hash that Java uses. The solution here is brute force, to get an initial working implementation. There are plenty of opportunities for optimization. One idea would be to use a subclass of SwitchPoint that would set a flag, or invoke a Runnable, the first time its guardWithTest method is called. If that hasn't happened, there is nothing to invalidate. The Runnable could add the containing object into some data structure more easily searched by the supplied hash. Transitions of the data structure between empty and not-empty could be propagated to a boolean in native memory, where the C callback code could avoid the Java upcall entirely if there is nothing to do. This commit contains none of those optimizations. Factory.invalidateType might be misnamed; it could be syscacheInvalidate and take the syscache id as another parameter, and then dispatch to invalidating a RegType or RegProcedure or what have you, as the case may be. At least, that would be a more concise implementation than providing separate Java methods and having the C callback decide which to call. But if some later optimization is tracking anything-to-invalidate? separately for them, then the C code might be the efficient place for the check to be done. PostgreSQL has a limited number of slots for invalidation callbacks, and requires a separate registration (using another slot) for each syscache id for which callbacks are wanted (even though you get the affected syscache id in the callback?!). It would be antisocial to grab one for every sort of CatalogObject supported here, so we will have many relying on CatalogObject.Addressed.s_globalPoint and some strategy for zapping that every so often. That is not included in this commit. (The globalPoint exists, but there is not yet anything that ever zaps it.) Some imperfect strategy that isn't guaranteed conservative might be necessary, and might be tolerable (PL/Java has existed for years with less attention to invalidation). An early idea was to zap the globalPoint on every transaction or subtransaction boundary, or when the command counter has been incremented; those are times when PostgreSQL processes invalidations. However, invalidations are also processed any time locks are acquired, and that doesn't sound as if it would be practical to intercept (or as if the resulting behavior would be practical, even if it could be done). Another solution approach would just be to expose a zapGlobalPoint knob as API; if some code wants to be sure it is not seeing something stale (in any CatalogObject we aren't doing selective invalidation for), it can just say so before fetching it.

This DualState subclass used to free the associated tuple table in the javaStateUnreachable lifespan event; now, only at javaStateReleased. It turns out that SPI_freetuptable, since postgres/postgres@3d13623, has contained code to raise a warning if the tuple table being freed does not belong to the current SPI connection. With the earlier javaStateUnreachable handling, that warning could be triggered on rare and irksome occasions when Java's GC happened to find, during a nested invocation of some Java function, that a tuple table from an outer invocation had become unreachable. It would be conceivable to have javaStateUnreachable try to determine if the current nest level matches that of the tuple table's creation, and free it if so at least, otherwise leaking it to the exit of the outer call. But for now it's also conceivable to just do nothing and let the context reset at invocation exit mop things up.

jcflack · 2025-04-26T20:32:21Z

A PL/Java-based language can handle columns/expressions of concrete type `anyarray`

The PostgreSQL type ANYARRAY, normally a polymorphic type that would only be seen in a routine's inputsTemplate or outputsTemplate prior to resolution at an actual call site, can in very particular circumstances be seen even after resolution, in a routine's inputsDesccriptor or outputsDescriptor. It is not normally possible to declare uses of ANYARRAY as a concrete type, but certain columns in PostgreSQL-supplied statistics-related catalog tables are declared that way, and the type will be seen for those columns or expressions involving them.

Such a column will always hold an array, but different rows may hold arrays of different element types. A method on Adapter.Array, elementType(), will supply an Adapter that produces the element type of an ANYARRAY-typed array. Once the element type is known, a suitable Adapter for that type can be chosen, and used to construct an array adapter for access to the array's content.

Dispatcher now supports languages implementing `TRANSFORM FOR TYPE`

If a PL's implementing class does not implement the UsingTransforms interface, the dispatcher will automatically reject routine declarations (at validation time, or at call time in case such a routine got created while validation couldn't happen) that include TRANSFORM FOR TYPE. The PL implementation does not have to concern itself with that, and this avoids the case where PostgreSQL allows routine declarations with TRANSFORM FOR TYPE for transform-unaware PLs where they will have no effect.

If a PL does implement UsingTransforms, the dispatcher will make sure that any Transform mentioned in a routine declaration for that language satisfies the language's essentialTransformChecks method. Because the fromSQL and toSQL functions of a transform have similar signatures in SQL to other functions that aren't transform functions at all, and PostgreSQL does not prevent CREATE TRANSFORM naming inappropriate functions, the essentialTransformChecks method should make a diligent effort to ensure that any proposed Transformhas fromSQL/toSQL functions the PL will be able to use.

Implementing that UsingTransforms method is only the start of the PL handler's job. The handler is also responsible for the entirety of whatever that PL will do to accomplish the results of the transforms. It will probably begin, in its prepare method, by consulting the memo's transforms() method to learn what transforms, if any, should be applied.

The Glot64 example language handler now contains example code involving transforms.

An implementation of PLJavaBasedLanguage may also implement ReturningSets. If it does, its prepareSRF method, not the usual prepare, will be used when the target RegProcedure's returnsSet() is true. The prepareSRF method must return an SRFTemplate. Further, what it returns must also implement one of SRFTemplate's member subinterfaces, ValuePerCall or Materialize (PostgreSQL might add additional options in the future). The base interface, SRFTemplate, has an abstract 'negotiate' method, to be passed a list of the subinterfaces the caller is prepared to accept, and return the index of one from the list that the routine will use. Each subinterface has a default implementation that will find itself in the caller's list. A class that implements more than one of the subinterfaces will inherit conflicting defaults, and therefore have to provide its own 'negotiate' implementation. The list of interfaces acceptable to the caller is ordered so as to reflect the caller's preference. A simple negotiate method could return the index of the first interface in the list that this SRFTemplate happens to implement. A more sophisticated one might take properties of the prepared routine into account. The ValuePerCall interface specifies a specializeValuePerCall method that is expected to return an SRFFirst. SRFFirst has a firstCall method that should return an SRFNext instance. SRFNext has a nextResult method that will be called as many times as necessary. It should use fcinfo->result / fcinfo->isNull to store result values for one row, like any non-set-returning function, but return SINGLE, MULTIPLE, or END to indicate whether another call is expected. SRFNext implements AutoCloseable, and its close() method will be called after the last call made to nextResult (which may happen before nextResult returns END, if PostgreSQL does not need all the results). The case where nextResult returns SINGLE is an exception, treated as returning only that one row, and close() will not be called. The Materialize interface will likewise specify a specializeMaterialize method, but the details are TBD, so at this stage of the API the Materialize interface is a stub and does not specify any usable behavior yet.

CleanupTracker is only used when assertions are enabled, and checks that entries and exits of 'cleanup' loops (cleanEnqueuedInstances for Java-released or unreachable instances, nativeRelease for expired native lifespans) are properly paired. It originally prohibited any more than one cleanup loop being in progress at any time, and was relaxed in 95d4133 to allow one of each type, assuming there would be no good reason for such a loop to be reentered. The generalization in 5a9cdf9 from "resource owner" to Lifespan, with a possibly-extensible set of Lifespan objects each heading its own list of dependents, introduced a realistic possibility that some class that is itself bounded by a Lifespan could also be a Lifespan for other classes. A practical application arises in modeling the PostgreSQL ExprContext. The chief (perhaps only) use of ExprContext in PL/Java is, via its callback, to signal when no more output is needed from a ValuePerCall set-returning function that may not have been read to the end. This is a use of ExprContext as a Lifespan to bound an SRFRoutine. PostgreSQL's ExprContext. however, does not invoke its callbacks if error cleanup is afoot, which (ironically enough) would leave its Java mirror un-cleaned-up in error cases. That can be addressed by also modeling that ExprContext itself has a lifespan bounded by its per-query memory context. So ExprContext both has a Lifespan, and is a Lifespan, and in error cases where the lifespanRelease of the memory context cascades to lifespanRelease of the ExprContext, the test made here by CleanupTracker was too restrictive. The pattern of a Lifespan with a Lifespan does not seem likely to become so widespread as to cause frequent or deep reentry of lifespanRelease, and the present example seems a legitimate and reasonable case, so relax the CleanupTracker assertion to allow it. In a related change, don't let exceptions abort lifespanRelease: cleanEnqueuedInstances was already swallowing exceptions (citing JDK 9 Cleaner as precedent) so they would not prevent processing of later queue entries, but lifespanRelease did not do the same. Now it does. It might be better one day to collect exceptions (perhaps as a suppressed list) to report after the loop.

ExprContextImpl is in the same o.p.p.pg package as implementations of API-exposed interfaces in o.p.p.model, but there may be no need for any such API interface, so this is purely for use in the internal module for now. For PL/Java's purposes, the chief use of ExprContext is, via its callback, to be usable as the Lifespan of a DualState instance associated with the row-collecting activity of a set-returning function in ValuePerCall mode, whose nativeStateReleased event can trigger calling the close() method (and resetting internal dispatcher state) when PostgreSQL collects fewer rows than the function intends. This use is all internal to the dispatcher. A PostgreSQL ExprContext carries other information of interest, such as the per-query memory context, but there is no need to fastidiously follow PostgreSQL by having the accessor method for that on ExprContext. As that memory context is needed before the very first call on a set-returning function, before it is even known whether a Java mirror of the ExprContext will need to be constructed, it will be better to fetch that memory context eagerly and put an accessor for it on ReturnSetInfoImpl instead. With that in mind, this model class provides no accessor methods at all, and simply exists to be used as a Lifespan.

Implementation of Materialize mode has to wait, as the API interface SRFTemplate.Materialize is still only a stub with details TBD. As contemplated in the implementation of ExprContextImpl, the per-query memory context is here made available with an accessor method directly on ReturnSetInfoImpl. It is passed eagerly up by the C dispatch code through the Java entry points to be readily available, as it will certainly be needed when any set-returning function is to be called.

As long as the store direction of TupleTableSlot remains unimplemented, null / void / zero are still the only values any Glot64 function can return. But now it can return sets of them! For a Glot64 set-returning functions, there are stricter limits on the source string. It must be a base64 string that, when 'compiled' (i.e., decoded), is the string representation of a decimal integer. The function so defined will return that many rows (of null / void / zero), ignoring any arguments. If the source string 'compiles' to a negative integer, a single row is returned, exercising the SRFNext.Result.SINGLE case.

jcflack · 2025-05-09T21:53:37Z

How a PL/Java-based language supports set-returning functions

An implementation of PLJavaBasedLanguage may also implement ReturningSets. If it does, its prepareSRF method, not the usual prepare, will be used when the target RegProcedure's returnsSet() is true. (If a PL/Java-based language does not implement ReturningSets, PL/Java's dispatcher will reject any such RegProcedure at validation and at dispatch time, so a language that does not intend to support set return does not have to concern itself with those details.)

The prepareSRF method must return an SRFTemplate. Further, what it returns must also implement one or more of SRFTemplate's member subinterfaces, ValuePerCall or Materialize (PostgreSQL might add additional options in the future).

The base interface, SRFTemplate, has an abstract negotiate method, to be passed a list of the subinterfaces the caller is prepared to accept, and return the index of one from the list that the routine will use. Each subinterface has a default implementation that will find itself in the caller's list. A class that implements more than one of the subinterfaces will inherit conflicting defaults, and therefore have to provide its own negotiate implementation.

The list of interfaces acceptable to the caller is ordered so as to reflect the caller's preference. A simple negotiate method could return the index of the first interface in the list that this SRFTemplate happens to implement. A more sophisticated one might take properties of the prepared routine into account.

The ValuePerCall interface specifies a specializeValuePerCall method that is expected to return an SRFFirst. SRFFirst has a firstCall method that should return an SRFNext instance. SRFNext has a nextResult method that will be called as many times as necessary. It should use fcinfo.result / fcinfo.isNull to store result values for one row, like any non-set-returning function, but return SINGLE, MULTIPLE, or END to indicate whether another call is expected. SRFNext implements AutoCloseable, and its close() method will be called after the last call made to nextResult (which may happen before nextResult returns END, if PostgreSQL does not need all the results). The case where nextResult returns SINGLE is an exception, treated as returning only that one row, and close() will not be called.

The Materialize interface will likewise specify a specializeMaterialize method, but the details are TBD, so at this stage of the API the Materialize interface is a stub and does not specify any usable behavior yet.

The Glot64 example language handler now has example code for set-returning functions. For as long as the store direction of TupleTableSlot remains unimplemented, null / void / zero are still the only values any Glot64 function can return. But now it can return sets of them!

beargiles · 2025-05-16T02:00:26Z

Adding this for the record since I didn't see it mentioned above.

The existing implementation uses JNI. It has many benefits but is a real PITA to work with.

That lead to the creation of 'Java Native Access' (JNA). It is much easier to use since you only need to provide (loosely) the API interface and location of the shared file. It can't do everything that JNI can - but it can do a lot.

This refactoring may want to look at how much of libpq can be implemented using JNA instead of JNI, and whether there would be a performance impact when doing so.

The main benefit of this approach would be 1) reducing the amount of work required by this project and 2) letting the PostgreSQL project maintain this functionality.

Finally it's possible that anything that still requires JNI might be handled in an updated libpq or other library.

see java-native-access/jna for a project that has wrapped a ton of C libraries, including many specific to operating systems or CPU processors.

jcflack · 2025-05-16T11:40:22Z

The existing implementation uses JNI. It has many benefits but is a real PITA to work with.

That lead to the creation of 'Java Native Access' (JNA). It is much easier to use since you only need to provide (loosely) the API interface and location of the shared file.

Opportunities surely exist for replacing some JNI with Java's own more recent Foreign Function and Memory API, preserving PL/Java's lack of third-party runtime dependencies. Some of the newer Java code (Datum.Accessor is an example IIRC) is already type-parametrized in order to support a future FFM implementation.

That said, the opportunities for replacing JNI with FFM are limited by the typical need for things to happen at the boundaries (transformation of Java exceptions into ereports and vice versa, for example), already taken care of in Thomas's JNICalls.c. For that reason, a lot of the existing uses of JNI are in no danger of going away.

Those considerations are a bit orthogonal to achieving the goals and correcting the deficiencies described at the top of this PR.

libpq, being a library for processing of the on-the-wire communication between a PostgreSQL backend and a connected client, isn't used in PL/Java and hasn't many facilities that would be of use here.

Also move the SQLAction creating the pljavahandler language onto the package declaration; no need of a dummy class for those.

PostgreSQL can sometimes pass an array so empty it has no dimensions.

The type List<String> assigned back in 7786fbf already had a comment that it probably wasn't appropriate, as each String in the list really represents a key=value pair and that should be made explicit. On inspection of the PostgreSQL parser and transformRelOptions function, it's clear the key is an SQL simple identifier, duplicates forbidden, and the value is a String and never null, so a null-hostile, unmodifiable Map<Identifier.Simple,String> fits the bill. An assumption has to be made that the key is never a delimited identifier with an '=' in it, so the first '=' in the stored value is unambiguously the delimiter between the key and the value. That's the same assumption made by PostgreSQL's untransformRelOptions (though, for the time being[0], PostgreSQL does not reject a key with '=' in it when accepting unknown custom options such as for foreign data wrappers). The same Map<Simple,String> type is appropriate also for Attribute.options() and Attribute.fdwoptions(), and for similar accessors on foreign-data-related catalogs when those are implemented. [0] https://www.postgresql.org/message-id/6830EB30.8090904%40acm.org

There are still others yet to be implemented, but these are easy enough.

Getting the descriptor from the typcache is handy when a relation has an associated type, but not all kinds of relation have one. An index or TOAST table doesn't, for example. There was already one case that had to be handled by going to the relcache, and that was to get the descriptor for pg_class itself, which can't be expected to find its associated type before it knows what its columns are. So that code path just needs to be used also for the relation kinds that don't have an associated type.

RegClass should have accessors for AccessMethod and Tablespace (Database has a Tablespace also). RegClass indirectly reaches ForeignDataWrapper and ForeignServer. Opting not to make ForeignTable a CatalogObject in its own right: it barely qualifies. It is identified by the oid of the RegClass, and functions more as an extension of that. Its options and ForeignServer can just be given accessors on RegClass. Accessors returning these things to be added in a later commit (along with whitespace-only tidying of lines added here).

This commit includes whitespace-only tidying of lines added in the previous commit.

ForeignTable is simply represented by two accessors added to RegClass. When foreign-table info is wanted, a little class RegClass holds in a single slot gets instantiated and constructs both values. The slot's invalidation still uses the RegClass switch point, rather than also hooking invalidation for pg_foreign_table. In passing, catch up with two pg_database attributes that changed in PG 15 from name to text.

The merged work includes PR #533, which does away with the old Ptr2Long union in favor of new PointerGetJLong and JlongGet(... conversions. In merging, also convert the uses of Ptr2Long that were added on this branch.

This continues the work started in the REL1_6_STABLE branch of fixing javadoc errors that prevent a successful javadoc run with maximal coverage. This fixes such errors that have been introduced in the org.postgresql.pljava.internal module in this branch.

The only well-known collations pinned with compile-time symbols remaining now are DEFAULT and C (postgres/postgres@51edc4c).

In PG 18, there is now a CompactAttribute struct found in tuple descriptors (postgres/postgres@5983a4c) that contains a field of like purpose, so the one in pg_attribute is gone (postgres/postgres@02a8d0c). PL/Java deforming wasn't making any use of it yet anyway. Where a TupleDesc used to have an attrs offset that was exactly where a sequence of Form_pg_attribute began, it now has a compact_attrs offset where a sequence of CompactAttribute starts. That is still followed by a sequence of Form_pg_attribute, so now the Java code looking for those has to take the compact_attrs offset and add natts * sizeof (CompactAttribute). The CompactAttribute structs were added upstream as a performance optimization, and could perhaps be made use of here to good effect, but for now just compute the offset and use the Form_pg_attribute in the accustomed way. Most promising, perhaps, would be to have TupleDescriptor make use of the attcacheoff member of the new struct.

The earlier work fixing run-busting errors in javadoc comments permits the javadoc coverage for the o.p.p.internal module to be expanded to cover package-private types and members. In passing, add enough missing class-level javadoc comments to make the resulting package listings somewhat presentable.

Interesting that Mac OS clang was the only compiler to spot it.

This doc comment was overlooked in 5d9836c.

beargiles · 2025-06-18T04:24:50Z

My recent PR has the ability to build custom docker images containing the backend jar(s) and then run tests using standard java (spring boot). It is targeted towards end users who want to test their implementations in a "real world" setting.

(Not everyone uses Spring Boot, but enough people do that it's a good place to start since it's then easy to add ORMs etc.)

It should be easy to tweak it to support testing changes to pljava.jar as well. The custom docker image is based on the official postgresql repo but it looks like it would be trivial to change the Dockerfile so it includes a few additional files and the database will now use the local build.

This won't incorporate all of the latest pgxn goodies but it should be great for regression testing since the tests can more closely mimic the real world with ORMs etc.

jcflack added 30 commits January 22, 2022 22:22

These two lines considered redundant

17a3a26

Both values have just been stashed by stashCallContext. Both will be restored 14 lines later by _closeIteration. And nothing in those 14 lines cares about them.

Tests for use of upper memory context

9342f68

Add a nested/SPI test to SetOfRecordTest

e380e6b

This can reveal issues with the nesting of SPI 'connections' or management of their associated memory contexts.

DualState clients that pass no Lifespan

f7561df

Simplify these: their C callers were passing unconditional null as the ResourceOwner before, which their Java constructors passed along unchanged. Now just have the Java constructor pass null as the Lifespan.

Client using its own ResourceOwner as Lifespan

901712a

This legacy Portal class is called from C and passed the address of the PostgreSQL ResourceOwner associated with the Portal itself.

Client with both ResourceOwner and MemoryContext

827f2d6

This is only an intermediate refactoring of VarlenaWrapper. Construction of one is still set in motion from C. Ultimately, it should implement Datum and be something that a Datum.Accessor can construct with a minimum of fuss.

jcflack added 2 commits April 25, 2025 18:01

Account for untested-because-unNamed catclasses

d52775c

jcflack added 5 commits May 9, 2025 16:03

jcflack mentioned this pull request May 14, 2025

Issue 525 - Added doxygen-based documentation #526

Open

jcflack added 15 commits May 23, 2025 12:49

Add package-info.java for polyglot examples

b8a4e9d

Also move the SQLAction creating the pljavahandler language onto the package declaration; no need of a dummy class for those.

Handle dims=0 arrays gracefully

cf4db3c

PostgreSQL can sometimes pass an array so empty it has no dimensions.

Add some neglected RegClass accessors

7205289

There are still others yet to be implemented, but these are easy enough.

Implement four models API'd in previous commit

cf8aa4c

This commit includes whitespace-only tidying of lines added in the previous commit.

Merge REL1_7_STABLE into feature/REL1_7_STABLE/model

fb0791f

The merged work includes PR #533, which does away with the old Ptr2Long union in favor of new PointerGetJLong and JlongGet(... conversions. In merging, also convert the uses of Ptr2Long that were added on this branch.

For PG 18, no more RegCollation.POSIX

17a62ab

The only well-known collations pinned with compile-time symbols remaining now are DEFAULT and C (postgres/postgres@51edc4c).

Squash one could-be-used-uninitialized warning

fe6e7da

Interesting that Mac OS clang was the only compiler to spot it.

Update PLJavaBasedLanguage interface javadoc

cece982

This doc comment was overlooked in 5d9836c.

jcflack mentioned this pull request Jun 6, 2025

Potential extension: inline java akin to jbang #536

Open

Merge REL1_7_STABLE into feature/REL1_7_STABLE/model

dd644aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Work-in-Progress on PL/Java refactoring, API modernization #399

Work-in-Progress on PL/Java refactoring, API modernization #399

Uh oh!

jcflack commented Jan 24, 2022 •

edited

Loading

Uh oh!

jcflack commented Apr 26, 2025 •

edited

Loading

Uh oh!

jcflack commented May 9, 2025 •

edited

Loading

Uh oh!

beargiles commented May 16, 2025

Uh oh!

jcflack commented May 16, 2025

Uh oh!

beargiles commented Jun 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Work-in-Progress on PL/Java refactoring, API modernization #399

Are you sure you want to change the base?

Work-in-Progress on PL/Java refactoring, API modernization #399

Uh oh!

Conversation

jcflack commented Jan 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why needed

JDBC

Parameter and return-value mapping

Elements of new API

Identification of data types

Other PostgreSQL catalog objects and key abstractions

Mapping PostgreSQL data types to what a PL supports

The Adapter class

A manager class for adapters

Extensibility

Adapters implementable in pure Java

Leaf adapters

Composing adapters

Contract-based adapters

ArrayAdapter

Java array-of-arrays

Adapter maintainability

Known open items

And then

Uh oh!

jcflack commented Apr 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

A PL/Java-based language can handle columns/expressions of concrete type anyarray

Dispatcher now supports languages implementing TRANSFORM FOR TYPE

Uh oh!

jcflack commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How a PL/Java-based language supports set-returning functions

Uh oh!

beargiles commented May 16, 2025

Uh oh!

jcflack commented May 16, 2025

Uh oh!

beargiles commented Jun 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jcflack commented Jan 24, 2022 •

edited

Loading

The `Adapter` class

jcflack commented Apr 26, 2025 •

edited

Loading

A PL/Java-based language can handle columns/expressions of concrete type `anyarray`

Dispatcher now supports languages implementing `TRANSFORM FOR TYPE`

jcflack commented May 9, 2025 •

edited

Loading