PostgreSQL

Ubuntu 19.04 installs PostgreSQL 11.5 (latest stable release in August, 2019)

PostgreSQL is a relational database management system (RDBMS), usually available on 5432/tcp (if it's installed and started). To find if PostgreSQL service is installed, try

systemctl status postgresql

or

/etc/init.d/postgresql status

If there is no service, there still could be a PostgreSQL client, try

psql

or

man psql

You can get a useful info about running server with ss (socket statistics):

ss -nlt

It shows the listening TCP sockets, and if PostgreSQL server is running, you will find a line like

LISTEN     0    128     127.0.0.1:5432      *:*

To be exact, 127.0.0.1:5432 means that server accepts requests from the local users only, i.e., from apps running on the same computer and connecting through the local loop-back interface (127.0.0.1), no access from the network. It's a default setting after installation due to safety reasons. See below how to make database available through the net.

Installation

First of all, run

sudo apt update

to update your local package index.

As usually with SQL databases, there is a server part and a client part. The following cmd installs everything (i.e., the latest version available in your Ubuntu repository, common add-ons, etc):

sudo apt install postgresql postgresql-contrib

The installation procedurte creates postgres Linux account. Note that postgres is also default database name and main database user.

The basic PostgreSQL client (and admin) app is psql (PostgreSQL interactive terminal, aka PostgreSQL front-end). It's a cmdline app. If you only want to connect to external servers, you don't have to install the server part - the client package would be enough:

sudo apt install postgresql-client

The main config file is /etc/postgresql/X.X/main/postgresql.conf (X.X - version numbers). Among other things, this config file sets the location of the database files.

Startup/Shutdown

There is a "simple" way and an "expert" way to startup and shutdown PostgreSQL database server (or call it "service" - the difference is unessential in this context).

Here is the "simple" way. In systems with systemd, you can use the following cmds (in mosts cases with sudo):

systemctl start postgresql

systemctl stop postgresql

systemctl restart postgresql

systemctl status postgresql

systemctl disable postgresql

systemctl enable postgresql

The last two cmds control boot policy: to start or not the PostgreSQL service at system boot.

You can also handle server using its init script:

/etc/init.d/postgresql start

/etc/init.d/postgresql stop

/etc/init.d/postgresql restart

/etc/init.d/postgresql reload

/etc/init.d/postgresql force-reload

/etc/init.d/postgresql status

The "expert" and subtle way to handle PostgreSQL assumes the use of pg_ctl utility. In fact, it can become the preferential way if you have to manage multiple server instances, clusters, and other non-trivial database configurations. See manual for details:

man pg_ctl

Connecting

To switch from your regular Linux account to postgres Linux account and to connect the database [local, on the same computer] as postgres (a privileged database user),

sudo -u postgres psql postgres

When psql successfully connects to the database server you will see the prompt:

postgres=#

The psql prompt includes the name of the currently connected database and a special indicator: # means that you are a database superuser and you are not subject to access controls; > means you are a regular database user.

In general, to connect to a database you should use something like this:

psql -h hostname -p port -U username dbname

psql -h server.domain database user

sudo -u postgres psql -h localhost -U appuser testdb

If you omit the -h, client connects via a Unix-domain socket to a server on the local host (or via TCP/IP to localhost on the machines without Unix-domain sockets). The default port is 5432, and the username is actually a role name; by default, it's OS username (user's Linux account name).

Every connection to a database server is made using the name of a particular role which defines the initial access privs for cmds issued in that connection. The role name for a particular db connection is specified by the client initiating a connection request in an application-specific fashion, e.g., psql uses -U option.

Users, roles, groups

Before PostgreSQL 8.1, users and groups were distinct kinds of entities, but now there are only roles which are used to manage database access permissions. Roles can own db objects and assign privs on those objects to other roles. It's also possible to grant membership in a role to another role.

Database roles are completely separate from OS users, though sometimes it may be reasonable to maintain a correspondence. Database roles are global across a database cluster.

CREATE ROLE name;

DROP ROLE name;

Note that Linux cmdline apps createuser and dropuser are just wrappers around those SQL cmds.

A freshly initialized database system always contains one predefined role: superuser. By default it has the same name as the OS user that initialized the database cluster, usually it's postgres. To create other roles you first have to connect as this initial role.

Only roles created with LOGIN attrib can be used as the initial role (username) for a database connection. In fact, such role is a "database user" in old sense. The following two cmds are equivalent:

CREATE ROLE name LOGIN;

CREATE USER name;

Other examples:

CREATE ROLE name CREATEDB;

CREATE ROLE name CREATEROLE;

CREATE ROLE name REPLICATION LOGIN;

CREATE ROLE name PASSWORD 'string';

The following permissions and attributes can be used with CREATE ROLE:

Using psql

Besides the standard SQL, you can use special (internal, meta-) cmds. The following list includes most popular special cmds:

\q

close db connection (i.e., exit psql);

\?

list all available command;

\c dbname

connect to a specified database;

\cd dir

change current working dir;

\conninfo

describe the current database connection;

\d

list tables in database;

\d tablename

describe the specified table;

\d+

list all tables in database with some additional info;

\d+ tablename

describe the specified table with additional info;

\dn

list all schemas;

\dn+

list all schemas with additional info;

\dp table

show info about existing privs for the specified table;

\du

list existing roles;

\dx

list installed PostgreSQL extensions;

\e

start text editor inside psql;

\encoding encoding

set the client character set encoding;

\i fname.sql

read and execute statements from the specified file;

\h

list all cmds for which syntax help is available;

\help cmd

syntax help on the specified cmd;

\l

list all databases;

\l+

list all databases with additional info;

\p

print the current query buffer to stdout;

\password username

setting/changing password for the specified user;

\r

reset (clear) the query buffer;

\timing

switch on the timing of query results (how long each SQL stmt takes, in milliseconds); to switch off add 'off';

SQL examples

Simple SQL statements:

SELECT version();

SELECT current_date;

SELECT rolname FROM pg_roles;

SELECT datname FROM pg_database;

CREATE TABLE emp (
id INT,
first_name VARCHAR(20),
last_name VARCHAR(20),
dept INT,
...);

DROP TABLE messages;

INSERT INTO emp VALUES (1, 'Scott', 'Pilgrim', ...);

INSERT INTO emp (id, first_name, last_name) VALUES (2, 'James', 'Bond');

INSERT INTO users (username, password, created, modified, enabled) VALUES ('user', 'abracadabra', '2019-11-18', '2019-11-18', true);

UPDATE users SET password = (SELECT password FROM app_user WHERE id = 8) WHERE id = 1;

DELETE FROM user_role;

COPY emp FROM '/home/user/tmp/emp-data.txt';

SELECT * FROM emp WHERE id = 2;

SELECT * FROM emp ORDER BY last_name;

ALTER TABLE users OWNER TO appuser;

ALTER TABLE user_role DROP CONSTRAINT user_role_fk1;

ALTER TABLE user_role ADD CONSTRAINT user_role_fk1 FOREIGN KEY (user_id) REFERENCES users;

An SQL script (filename schema-postgresql.sql):

CREATE SCHEMA IF NOT EXISTS test AUTHORIZATION appuser;

-- To use Hibernate and PostgreSQL together for
-- storing large binary data, your best bet is to
-- use a consistent style for defining the column
-- datatypes, i.e, pick either "bytea" or "oid" and
-- use it consistently);
-- set hibernate.jdbc.use_streams_for_binary
-- appropriately, based on which style you chose
-- (false for oid, true for bytea);
-- NOTE! If you add @Lob annotation,
--     hibernate will generate an oid;

CREATE TABLE category (
id INTEGER CONSTRAINT category_pk PRIMARY KEY,
name VARCHAR(32) NOT NULL);


CREATE TABLE persons (
id BIGSERIAL CONSTRAINT persons_pk PRIMARY KEY,
first_name VARCHAR(32) NOT NULL,
last_name VARCHAR(32) NOT NULL,
category_id INTEGER,
misc_info VARCHAR(4000),
birth_date DATE,
pass_date DATE,
picture OID);


CREATE TABLE app_user (
id SERIAL CONSTRAINT app_user_pk PRIMARY KEY,
username VARCHAR(32) NOT NULL,
password VARCHAR(60) NOT NULL);


CREATE TABLE role (
id INTEGER CONSTRAINT role_pk PRIMARY KEY,
name VARCHAR(32) NOT NULL);


CREATE TABLE user_role (
user_id INTEGER NOT NULL,
role_id INTEGER NOT NULL,
CONSTRAINT user_role_pk PRIMARY KEY (user_id, role_id));


alter table app_user add constraint app_user_uk1 unique (username);
alter table category add constraint category_uk1 unique (name);
alter table role add constraint role_uk1 unique (name);
alter table persons add constraint persons_fk1 foreign key (category_id) references category;
alter table user_role add constraint user_role_fk1 foreign key (user_id) references app_user;
alter table user_role add constraint user_role_fk2 foreign key (role_id) references role;

-- GRANT ALL PRIVILEGES ON category TO appuser;
-- GRANT ALL PRIVILEGES ON persons TO appuser;
-- GRANT SELECT, INSERT, UPDATE, DELETE ON category TO appuser;
-- GRANT SELECT, INSERT, UPDATE, DELETE ON persons TO appuser;

-- GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO appuser;

-- ALTER DEFAULT PRIVILEGES IN SCHEMA public
--     GRANT USAGE, SELECT ON SEQUENCES TO appuser;

Data types

PostgreSQL supports the following SQL data types:

  int
  smallint
  real
  double precision
  char(N)
  varchar(N)
  date
  time
  timestamp
  interval
  ...

and other types of general purpose, a set of geometric types, user-defined data types, etc. Type names are not key words, except where it's required to support special cases in the SQL standard.

PostgreSQL-related Linux apps

Some PostgreSQL-related tasks are implemented as Linux cmds. For example, the following cmds can be used to create a new user or a new database, to drop an existing database or user:

sudo -u postgres createuser zorro

sudo -u postgres createdb testdb

sudo -u postgres createdb -O zorro mydb

sudo -u postgres dropdb mydb

sudo -u postgres dropuser batman

As you see, all above cmds switch to postgres account, because you cannot create or drop database unless you have the appropriate privileges.

PostgreSQL user names are separate from OS user accounts. When you connect to a database, you can choose what PostgreSQL user name to connect as; if you don't, it will default to the same name as your current OS account. There is always a PostgreSQL user account that has the same name as OS user that started the server, and this user always has permission to create databases. Instead of logging in as that user, you can specify the -U option to select another PostgreSQL user name to connect with. All these cmds accept standard connection options:

There are also many other options, see Linux mans for these cmds.

Installing Adminpack

To install PostgreSQL Adminpack, enter the following cmd in psql prompt:

postgres=# CREATE EXTENSION adminpack;

CREATE EXTENSION

You can check installed modules with the following cmd:

postgres=# select * from pg_available_extensions;

Auth, access, etc

By default, TCP/IP connection is disabled, and remote users cannot access the database. To remove this restriction, edit /etc/postgresql/X.X/main/postgresql.conf file:

listen_addresses = '*'
port = 5432

Restart postgresql service to activate changes, and try

ss -nlt

If you've done everything right, it would display

LISTEN     0    128     0.0.0.0:5432      0.0.0.0:*

And do not forget about your firewall, make sure it allows to accept connection requests on 5432/tcp.

If you value your data, you should use at least MD5 authentication which requires the clients to supply an MD5-encrypted passwords for authentication. To set this restriction, edit /etc/postgresql/X.X/main/pg_hba.conf file.

Note that the network range 192.168.1.0/24 is given as an example, and there is no corresponding range for IPv6 in assumption that clients would not use IPv6 (often, but not always, true)!

[...]
# TYPE  DATABASE  USER  ADDRESS     METHOD

# "local" is for Unix domain socket ... only
local   all       all                   md5
# IPv4 local connections:
host    all       all   127.0.0.1/32    md5
host    all       all   192.168.1.0/24  md5
# IPv6 local connections:
host    all       all   ::1/128         md5
[...]

Restart postgresql service to activate changes.

If you still cannot connect and receive an error message like

  FATAL: no pg_hba.conf entry for host "postresql-client", ...

go back to /etc/postgresql/X.X/main/pg_hba.conf file and try to add a line like this:

host    all      all      0.0.0.0/0  trust

Moving database files

You can get info about the current data dir from the config file:

/etc/postgresql/X.X/main/postgresql.conf

or from the database itself:

sudo -u postgres psql

...

postgres=# show data_directory;

...

postgres=# \q

Choose new location and shutdown database (it's required to ensure the data integrity!):

sudo service postgresql stop

...

sudo service postgresql status

or (for Linux with systemd)

sudo systemctl stop postgresql

...

When database is successfully shut down copy current data dir to a new location (you should not create new dir manually, leave it to rsync):

sudo rsync -av /var/lib/postgresql /u02

Now you must edit config file (see above):

data_directory = '/u02/postgresql/X.X/main'

Start database to see if it's O.K. and it really uses new location:

sudo service postgresql start

...

sudo service postgresql status

or (in case ...)

sudo systemctl start postgresql

...

sudo systemctl status postgresql

...

sudo -u postgres psql

...

postgres=# show data_directory;

...

At the old location you can rename main to something like main.bak to avoid confusion in future, or (maybe after some reasonable wait time) delete old stuff.

Backup / restore

To backup a single database use pg_dump (Linux app), to backup the whole database (i.e., database cluster) use pg_dumpall (Linux app).

To backup a single database in plain text format (which is actually SQL script):

sudo -u postgres pg_dump testdb > testdb_bkp.sql

(you should do it on behalf of DBA, i.e. database superuser). To restore testdb using this backup file, you must first create an empty database with the appropriate name, then exec:

sudo -u postgres psql testdb < testdb_bkp.sql

An alternative restore cmd:

sudo -u postgres psql -d testdb -f testdb_bkp.sql

There is a slightly different approach: backup using a custom-format archive file:

sudo -u postgres pg_dump -Fc testdb > testdb_bkp.dump

It provides a compressed file, and allows to restore separate tables. However, in this case you must restore like this:

sudo -u postgres pg_restore -d testdb testdb_bkp.dump

Once again, you have to create testdb beforehand.

Since pg_dump does not save info about roles or tablespaces (it's a cluster-wide stuff), PostgreSQL developers provide a utility pg_dumpall which saves the entire contents of a database cluster - all databases + cluster-wide data:

sudo -u postgres pg_dumpall > full_db.bkp

To restore the whole database cluster:

sudo -u postgres psql -f full_db.bkp postgres

You should specify some existing database name to start from, but if you are loading into an empty cluster, use postgres. It's important to have database superuser access when restoring a pg_dumpall dump. Also, if you use tablespaces, make sure that the tablespace paths in the dump are appropriate for the new installation.

Concepts

A database is a named collection of SQL (database) objects. A collection of databases managed by a single PostgreSQL server instance constitutes a database cluster.

Every database object (table, function, ...) belongs to one and only one database. But there are some system catalogs, e.g., pg_database, that belong to a whole cluster and are accessible from each database within the cluster. In general, the hierarchy looks like this: server, database, schema, table (or some other kind of object).

In a connection request client must specify the name of the database, it's not possible to access more than one database per connection. However, app is not restricted in the number of connections it opens to the same or other databases.

Databases are physically separated and access control is managed at the connection level. If one PostgreSQL server instance houses projects or users that should be separate and for the most part unaware of each other, put them into separate databases. If the projects or users are interrelated and should be able to use each other's resources, put them in the same database but into separate schemas. Schema is a purely logical structure, who can access what is managed by the privilege system.

Each table is a named collection of rows. Each row of a given table has the same set of named columns, and each column has a specific data type.

PostgreSQL tablespace allows db admin to define location in the file system where the files representing database objects can be stored, e.g. (on behalf of database superuser):

CREATE TABLESPACE fastspace LOCATION '/mnt/sda1/postgresql/data';

The location must be an existing, empty dir owned by the PostgreSQL OS user. All objects subsequently created within the tablespace will be stored in files underneath this dir.