CREATE TABLE Measures(
expId SERIAL,
iteration INT NOT NULL,
value float4 NOT NULL,
PRIMARY KEY(expId, iteration)
);
So, a table of various measurements, repeated for n iterations.
Though, because we have more data than originally expected, I want to move to a new table layout that instead uses an array column, which overall gives better performance (already tested and benchmarked):
CREATE TABLE TmpMeasures(
expId SERIAL PRIMARY KEY,
values float4[] NOT NULL
);
My problem now is how to get the old data into the new format.
In the simplest case, the data may look something like this:
INSERT INTO Measures (expId, iteration, value)
VALUES
(1, 1, 1.1),
(1, 2, 2.1),
(1, 3, 3.1),
(2, 1, 1.2),
(3, 1, 1.3);
And conversion could be done with a two step process, roughly like this, to first create the array for an experiment, and then populate the iteration values:
INSERT INTO TmpMeasures(expId, values)
SELECT expId, '{}'::float4[]
FROM Measures
ON CONFLICT DO NOTHING;
UPDATE TmpMeasures tm
SET values[iteration] = m.value
FROM Measures m WHERE tm.expId = m.expId;
Though, my problem now is that the UPDATE actually only ever seems to take the first iteration, i.e., iteration = 1.
I am not quite understanding why that is the case.
I suspect, alternative approaches to values[iteration] would try to group by expId, and order by iteration and aggregate that into an array.
Unfortunately, the data isn't perfect, but iterations should line up.
So, the following seems to work, but it's extremely slow, and I don't quite understand why it's needed in the first place.
DO
$do$
BEGIN
FOR i IN 1..(SELECT max(iteration) FROM Measures m) LOOP
UPDATE TmpMeasures tm
SET values[i] = m.value
FROM Measures m
WHERE
tm.expId = m.expId AND
m.iteration=i;
END LOOP;
END
$do$;
Why does the "normal" update statement not suffice?
0 comments:
Post a Comment
Thanks