Get structured data from PostgreSQL

have You ever puzzle over how to return from a stored procedure PostgreSQL a sophisticated design with a clever hierarchy, and not to write in the app is a huge crutch for parsing tree structures, tolkinay forces the developer in a flat relational table? If the answer is Yes, then ask under the cut...


Good day!
We all know that the result of a query to a relational DBMS is the table. Table container, due to its rigid structure, imposes a number of restrictions on the data reported. For example, the result of the sample having in its composition of the Association (join) is a denormalized structure, which hides the original topology of the data, which creates difficulties in handling such a result to the application. With increasing number of joins in the CQ situation is only getting worse.
But not as bad as it might seem because developer PostgreSQL has at its disposal a flexible data structure, able to encapsulate the dataset of any complexity. Talking about arrays (arrays) and structures (record, composite type).
The result of the above sample with the unions, transformed into an elegant, strictly hierarchical design view:

(
table1_column1 int,
table1_column2 varchar,
table1_column3 numeric,
table2_columns t2_columns[]
)


where t2_columns is a structure of the form
(
table2_column1 double precision,
table2_column2 timestamp
)


Thus by increasing the level of nesting, you can pass arbitrarily complex hierarchical structure.
In my case the application was written in PL/pgSQL and the interface provided stored procedures that return the very difficult to organized data. Actually this article is about the approach to parsing serialized PostgreSQL dataset's.

Directly from parsing the serialized string in the PostgreSQL data (the same string that is obtained by bringing the record to varchar) were abandoned immediately, because it means this parsing not available anywhere other than the kernel postgres'.
Not forced itself long to wait for an idea to change the data representation to use more standardized tools of analysis, has been developed.
Spending some time searching for ready solutions, and discard such candidates as built-in types and xml * PostgreSQL hstore, I came to the conclusion (perhaps incorrect) that is suitable in all respects of a method of transmitting data in the app is not available.

As a data view for the future of Cycling, was elected as JSON (because of the compactness and tekstovosti), and the method of implementation was native library for PostgreSQL (in pure C). I will not go into the internal structure of the resulting tool, especially because it is quite easy and anyone should be able to understand. Consider a library with a utilitarian point of view. The interface includes a set of functions.

Examples of usage:

Serializable structure:
the
select to_json( row( 10, 'Some text', 12.5, row( 'text in nested record', array[ 1, 2, 3 ] )::text_and_array, array[ 'array', 'of', 'text' ] ) )


The result of the query:
the
{"f1":10,"f2":"Some text","f3":12.5,"f4":{"str":"text in the nested record","arr":[1,2,3]},"f5":["array","of","text"]}


Serializable dataset's:
the
select json_agg( q.*, 'json_field_name1' ), json_agg_plain( q.*, 'json_field_name2' ) as q from...;


The result of the query:
the
{"json_field_name1":[{ ... },{ ... }, ...]} "json_field_name2":[{ ... },{ ... }, ...]


Later the library was extended with functionality that allows you to perform the reverse operation of parsing the JSON and populate the fields of the structures.

Deserialize structures and arrays:
the
 select from_json( 'some_record_type', '{"field1":"some text","field2":123,"field3":["this","is","array","of","text"]}' );
select arr_from_json( 'some_type[]', '[{"this is array of"},{"records with one field"}]' );


Query results:
("some text",123,"{\"this\",\"is\",\"array\",\"of\",\"text\"}")

{("this is array of"),("records with one field")}


Describes the serializer can be used in conjunction with any runtime environment that contains tools parsing JSON (which is C++ and PHP). Benchmark'and show performance comparable to built-in PostgreSQL serializer.
Thank you for your attention, comments and constructive criticism welcome.

library Reference
Article based on information from habrahabr.ru

Comments

Popular posts from this blog

Powershell and Cyrillic in the console (updated)

Active/Passive PostgreSQL Cluster, using Pacemaker, Corosync

Automatic deployment ElasticBeanstalk using Bitbucket Pipelines