Manage the Data

List Datasets

After you collected some data and stored it in an HDF5 file, you can use the Database class to work with it. First, initialise the Database object and list the samples which are stored there:

>>> from cohesivm.database import Database
>>> db = Database('Test.h5')
>>> db.get_sample_ids()
['test_sample_42']

This is exactly the sample_id which was specified when the Experiment was configured, and it can be used to retrieve the actual dataset path in the Database object:

>>> db.filter_by_sample_id('test_sample_42')
['/CurrentVoltageCharacteristic/55d96687ee75aa11:26464063430fe52f:a69a946e7a02e547:c8965a35118ce6fc:67a8bfb44702cfc7:8131a44cea4d4bb8/2024-07-01T10:44:59.033161-test_sample_42']

The resulting list contains the path strings for all Datasets with the specified sample_id (currently only one entry). These strings get quite long because they contain the name of the Measurement procedure, followed by a hashed representation of the settings dictionary, and finally the datetime combined with the sample_id.

Access Metadata

With a dataset path, you can retrieve information from the Metadata object which was created by the Experiment:

>>> dataset = db.filter_by_sample_id('test_sample_42')[0]
>>> metadata = db.load_metadata(dataset)
>>> metadata.sample_id, metadata.device, metadata.interface, metadata.measurement
('test_sample_42', 'Agilent4156C', 'MA8X8', 'CurrentVoltageCharacteristic')

Creating a new Dataset is less trivial because you need a fully qualified Metadata object, which asks for a large number of arguments. Anyway, this is usually handled by the Experiment class which also guarantees that the specified components are compatible.

As a test, you can use the Metadata object from above to initialize a new Dataset in the database:

>>> db.initialize_dataset(metadata)
'/CurrentVoltageCharacteristic/55d96687ee75aa11:26464063430fe52f:a69a946e7a02e547:c8965a35118ce6fc:67a8bfb44702cfc7:8131a44cea4d4bb8/2024-07-01T10:46:05.910371-test_sample_42'

This yields practically the same dataset path as before, only the datetime is different.

Work with Data Entries

Adding data entries to a Dataset is fairly simple since you only need to specify the dataset path and the contact_id (alongside the data of course):

>>> db.save_data(np.array([1], dtype=[('Quantity (Unit)', int)]), dataset)
>>> db.save_data(np.array([42], dtype=[('Quantity (Unit)', int)]), dataset, '1')

You should use a structured array by providing a dtype with named fields because it facilitates to store the quantity and the unit alongside the data.

Finally, you can load a data entry by specifying the contact_id, several entries by using a list of IDs, or load an entire Dataset, including the Metadata:

>>> db.load_data(dataset, '0')
array([(1,)], dtype=[('Quantity (Unit)', '<i4')])
>>> db.load_data(dataset, ['0', '1'])
[array([(1,)], dtype=[('Quantity (Unit)', '<i4')]),
 array([(42,)], dtype=[('Quantity (Unit)', '<i4')])]
>>> db.load_dataset(dataset)
({'0': array([(1,)], dtype=[('Quantity (Unit)', '<i4')]),
  '1': array([(42,)], dtype=[('Quantity (Unit)', '<i4')])},
 'Metadata(CurrentVoltageCharacteristic, Agilent4156C, MA8X8)')

To work with a structured array, you need to know the names of the fields which are stored in the dtype property. With this name, you can access the data of an individual field:

>>> a = np.array([(1, 42)], dtype=[('Quantity1 (Unit1)', int), ('Quantity2 (Unit2)', int)])
>>> a.dtype
dtype([('Quantity1 (Unit1)', '<i4'), ('Quantity2 (Unit2)', '<i4')])
>>> a['Quantity1 (Unit1)']
array([1])

The Database class also implements methods for filtering datasets based on the settings of the Measurement. Check out the documentation of the filter_by_settings() and filter_by_settings_batch() to learn more.