Manage the Data
After you collected some data and stored it in an HDF5 file, you can use the Database class
to work with it. First, initialise the Database object and list the samples which are stored
there:
>>> from cohesivm.database import Database
>>> db = Database('Test.h5')
>>> db.get_sample_ids()
['test_sample_42']
This is exactly the sample_id which was specified when the
Experiment was configured, and it can be used to retrieve the actual
dataset path in the Database object:
>>> db.filter_by_sample_id('test_sample_42')
['/CurrentVoltageCharacteristic/55d96687ee75aa11:26464063430fe52f:a69a946e7a02e547:c8965a35118ce6fc:67a8bfb44702cfc7:8131a44cea4d4bb8/2024-07-01T10:44:59.033161-test_sample_42']
The resulting list contains the path strings for all experiments with the specified
sample_id (currently only one entry). These strings get quite long because they
contain the name of the Measurement
procedure, followed by a hashed representation of the settings dictionary,
and finally the datetime combined with the sample_id. With this
dataset path, you may retrieve some information from the
Metadata object which got created by the Experiment:
>>> dataset = db.filter_by_sample_id('test_sample_42')[0]
>>> metadata = db.load_metadata(dataset)
>>> metadata.sample_id, metadata.device, metadata.interface, metadata.measurement
('test_sample_42', 'Agilent4156C', 'MA8X8', 'CurrentVoltageCharacteristic')
Storing a new dataset is less trivial because you need a fully qualified Metadata object,
which asks for a large number of arguments. Anyway, this is usually handled by the
Experiment class because it guarantees that the specified components are compatible. For
testing, the Metadata object from above may be used to initialize a new dataset:
>>> db.initialize_dataset(metadata)
'/CurrentVoltageCharacteristic/55d96687ee75aa11:26464063430fe52f:a69a946e7a02e547:c8965a35118ce6fc:67a8bfb44702cfc7:8131a44cea4d4bb8/2024-07-01T10:46:05.910371-test_sample_42'
This yields practically the same dataset path as before, only the datetime is different. Adding data entries, on
the other hand, is fairly simple because you only need to specify the dataset and the contact_id (alongside
the data of course):
>>> db.save_data(np.array([1]), dataset)
>>> db.save_data(np.array([42]), dataset, '1')
Finally, you may load a data entry by specifying the contact_id (or a list of several) or load an entire dataset,
including the Metadata:
>>> db.load_data(dataset, '0')
[array([1])]
>>> db.load_data(dataset, ['0', '1'])
[array([1]), array([42])]
>>> db.load_dataset(dataset)
({'0': array([1]), '1': array([42])}, Metadata(CurrentVoltageCharacteristic, Agilent4156C, MA8X8))
The Database class also implements methods for filtering datasets based on
settings of the Measurement. Check out the
documentation of the filter_by_settings() and
filter_by_settings_batch() to learn more.