Crash course in PostgreSQL, part 2
In part 1, we learned important PostgreSQL fundamentals. Today, we'll learn how to populate a table with data, and about important concepts like schema, normalization, views, and transactions.
Populating a table with data
In part 1, we created a small example comics table. We learned how to list all of the tables in our database, including the system tables that are always present. This command lists only the tables we created:
testdb=# \dt List of relations Schema | Name | Type | Owner --------+---------+-------+------- public | comics | table | carla
We only created the table structure, so let's populate it with data. Tables have columns and rows. Each column contains a specific data type, and each row contains your data. Don't get bogged down in row order because it doesn't matter, and you can't control it anyway. It may help to think of a database as a collection of reasonably-organized pots of data, into which you dip precise queries to retrieve whatever data you want. It's the queries that need to be specific and well-ordered, not your table rows. The INSERT statement put data into our table rows, like this:
testdb=# INSERT INTO comics VALUES ('Fabulous Furry Freak Brothers 1',
'Rip Off Press', '1971-03-15'); INSERT 0 1
You must insert your data in the correct column order. If you don't remember your column order, there are two options: one, read your table structure:
testdb=# \d comics Table "public.comics" Column | Type | Modifiers ----------------+-----------------------+----------- name | character varying(80) | publisher | character varying(80) | date_published | date |
Or list your columns, in any order, in your INSERT command, and then list your values in the same order:
testdb=# INSERT INTO comics (date_published, publisher, name)
VALUES('1971-03-15', 'Rip Off Press', 'Fabulous Furry Freak Brothers 1');
Did it work? Let's look:
testdb=# SELECT * from comics; name | publisher | date_published ---------------------------------+---------------+---------------- Fabulous Furry Freak Brothers 1 | Rip Off Press | 1971-03-15 (1 row)
By golly, it did!
What if you have a lot of data to insert? A fast way is to enter them in a plain text file, one data set per line in column order, with each field separated by a tab. Make sure there are no trailing newlines following your last line, or psql will give you a "missing data" error that references the empty lines. This example text file, freakbros.txt, contains two rows of data:
Fabulous Furry Freak Brothers 2 Rip Off Press 1973-04-19 Fabulous Furry Freak Brothers 3 Rip Off Press 1973-12-03
Populate your table with data from this file like this:
testdb=# \copy comics FROM '/home/carla/Documents/postgres/freakbros.txt'
There are several important items to note here. There are two copy commands: a SQL COPY command, and a psql \copy command. These are close siblings, but not twins. Using COPY means the postgres user must have read permissions to the text file. Using \copy means the system user who is logged into psql must have read permissions on the file; for example, the "carla" user can import from a text file in her home directory.
The COPY command is faster because it runs on the server, where the \copy command runs from the psql client to the server. On success it returns a COPY [count] message telling how many rows it copied. It also saves a fair bit of hassle because you can run scripts without having to worry about who owns your source data file.
\copy does not require a terminating semi-colon, while COPY does. You may also elect to use a different delimiter, such as a comma or a pipe symbol instead of tabs. Make sure there are no spaces on either side of your delimiter, then specify your chosen delimiter like this COPY example:
testdb=# COPY comics FROM '/etc/postgresql/9.0/main/freakbros.txt' USING DELIMITERS ',';
Getting file permissions sorted out is a common source of confusion. SELinux may get in your way and need some adjusting, and plain old Unix file permissions can sometimes bite in weird ways. The simplest way I have found to make files accessible to the postgres user is to make a special directory owned by the postgres user and group, and then make all the files in it owned by postgres. Or use a directory already owned by postgres. For example, on my Debian testing system /etc/postgresql/9.0/ is owned by postgres, so I put my files in there. Use the find command to quickly find which files and directories are owned by postgres. This example searches the whole filesystem except for the /proc pseudo-directory and network shares:
# find / -xdev \( -name proc -prune \) -o -user postgres
Exporting table data
Putting data into a PostgreSQL table is fun, and you can export it even more easily with the COPY and \copy commands. Again, COPY requires a terminating semi-colon and an output directory that the postgres user can write to, while \copy has no terminating semi-colon and must output to the logged-in psql user's directory. Here are examples of both:
testdb=# COPY comics TO '/etc/postgresql/9.0/main/freakout.txt'; testdb=# \copy comics TO '/home/carla/freakout1.txt'
You may use any of the COPY and \copy command options on exports, such as delimiters, just like when you copy data from a file into a table.
We could go on for ages on specific operations and commands, and that would be way fun. But even in online articles there are limits, and now you know enough to continue exploring different commands on your own such as updates, deletions, joins, how to handle nulls, finding specific records and fields, and so on. So now let's take a quick look at some key terminology.
Databases are chock-full of special terminology. Here are four terms you should know and understand, and in knowing, and understanding them, you'll be ahead of a lot of people who think they know databases.
Views are wonderful shortcuts for queries. A query is a request to see data, like our simple SELECT * from comics; example. Queries can grow large and complex and span multiple tables. Any query can be defined in a view, and then instead of typing a long query command simply type the view name.
Transactions are fundamental to all databases. A common criticism of MySQL is that it does not support transactions. It does, depending on which storage backend you're using, but that is a long story for another day. A transaction groups all of the steps for a particular operation into a single operation that returns success only if all the steps succeed. If a single step fails then the whole transaction fails, rather than leaving the transaction in an incomplete and erroneous state. The transaction is not reported as completed until all the steps are written to disk. For example, consider all of the steps involved in ordering a book from an online seller. You want your payment credited, book moved to the shipping queue, and an order confirmation. The seller wants their inventory updated and payment processed correctly. If any of these steps fail then a well-designed ordering system reports the error, and it must be corrected or done over.
Schema is a word you'll see a lot, and there are a lot of different ideas on what it means. Creating a formal schema for your PostgreSQL databases is optional, and it starts with the CREATE SCHEMA command. Schema is the metadata that describes all the elements of database structure: tables, operators, data types, functions, the relationships between tables, every piece of your database. There are tools for creating graphical schemas illustrating all the different relationships, which is very useful for understanding your database structure and debugging problems. In PostgreSQL schemas allow you to arbitrarily organize database objects and to give users access to arbitrary database objects, which is a nice simple and flexible way to control user access.
Normalization is one of those words that sounds important and something that needs to be done -- normalize your tables! And it is. Normalization covers a lot of ground; in a nutshell it is reducing redundancy. This makes your database more efficient, and easier to maintain. So it encompasses tasks such as designing your tables to avoid duplication, using whitespace consistently, using consistent terminology and spelling for better searches, and sanitizing user input -- which is beautifully illustrated in the XKCD comic, Exploits of a Mom, in which we meet Little Bobby Tables.