timelink.pandas package

Pandas related utilities for Timelink In progress. Originally developped for the FAUC1537-1919 project. Now being integrated in the timelink-py package

Submodules

timelink.pandas.attribute_values module

Create a dataframe with the values of an attribute

timelink.pandas.attribute_values.attribute_values(the_type, attr_type=None, groupname=None, dates_between=None, db: TimelinkDatabase | None = None, session=None, sql_echo=False)[source]

Return the vocabulary of an attribute

The returned dataframe has a row for each unique value a ‘count’ with the number of different entities, and the the first and last date for that row

Parameters:

for (the_type = attribute type to search)
the_type (attr_type = alians for)
deprecated
use (session = database session to)
specified (either db or session must be)
by (groupname = groupname to filter)
use
use
db.session() (if None will use)
yyyy-mm-dd (dates_between = tuple with two dates in format)
statement (sql_echo = if true will print the sql)

To filter by dates: dates_in = (from_date,to_date) with dates in format yyyy-mm-dd will return attributes with from_date < date < to_date

timelink.pandas.entities_with_attribute module

timelink.pandas.entities_with_attribute.entities_with_attribute(the_type: str | List[str], the_value=None, column_name=None, entity_type='entity', show_elements=None, dates_in=None, name_like=None, filter_by=None, more_attributes=None, db: TimelinkDatabase | None = None, session: Session | None = None, sql_echo=False)[source]

Generate a pandas DataFrame of entities filtered by attributes.

Parameters:

the_type – Attribute type (string, SQL wildcard, or list of types).
the_value – Optional value filter (string, SQL wildcard, or list of strings).
column_name – Optional column name to use for the attribute values.
entity_type – Entity model to query (default “entity”).
show_elements – Entity columns to include in the result.
dates_in – Tuple (after, before) to constrain attribute dates (exclusive).
name_like – Optional SQL LIKE filter on the entity name.
filter_by – List of entity ids to include even if attributes are missing.
more_attributes – Additional attribute types to join into the result.
db – TimelinkDatabase instance (required if session is not provided).
session – SQLAlchemy session; if omitted, one is created from db.
sql_echo – When True, echo the generated SQL statements.

Returns:

pandas.DataFrame or None when no rows match. Result columns include the requested entity fields plus attribute value, date, observation, type, line, level, attr_id, groupname, and any available extra_info entries.

timelink.pandas.group_attributes module

Return the attributes of a group of entities in a DataFrame.

timelink.pandas.group_attributes.display_group_attributes(ids, entity_type='entity', header_elements=None, header_attributes=None, sort_header=None, sort_attributes=None, category='id', cmap_name='tab20', include_attributes=None, exclude_attributes=None, db: TimelinkDatabase | None = None)[source]

Display attributes of a group with header and colored rows.

Same as group attributes but a header is displayed for each entity and each entity is colored. The attribute list is also colored, to make is clear which attributes are from which entity.

Parameters:

ids – list of ids
entity_type – type of entities to show
header_elements – elements of entity type to include in header (e.g. name, description, obs)
header_attributes – list of attribute types to include in header
sort_header – sort the header by this attribute
sort_attributes – sort the attributes by this attribute
include_attributes – list of attribute types to include
exclude_attributes – list of attribute types to exclude
db – a TimelinkDatabase object if None specify session
category – column to use for coloring
cmap_name – name of the colormap to use. See https://matplotlib.org/stable/tutorials/colors/colormaps.html

timelink.pandas.group_attributes.group_attributes(group: list, entity_type='entity', include_attributes=None, exclude_attributes=None, show_elements=None, db: TimelinkDatabase | None = None, session: Session | None = None, sql_echo=False)[source]

Return attributes of a group of entities in a DataFrame.

Parameters:

group – List of entity ids to include.
entity_type – Entity type to query (defaults to “entity”).
include_attributes – Attribute types to include; supports wildcards.
exclude_attributes – Attribute types to exclude.
show_elements – Entity columns to include (for example name, description, obs).
db – TimelinkDatabase instance; required if session is not provided.
session – Existing SQLAlchemy session; used when db is not provided.
sql_echo – When True, echo the generated SQL statements.

timelink.pandas.name_to_df module

timelink.pandas.name_to_df.pname_to_df(name, db: TimelinkDatabase | None = None, session=None, similar=False, name_particles=None, sql_echo=False)[source]

pname_to_df return df of people with a matching name

Parameters:

name – name to search for
db – = database connection to use, either db or session must be specified
session – session to use, either db or session must be specified
similar – if true will strip particles and insert a wild card % between name components with an extra one at the end
name_particles – list, particles to remove before comparing names

timelink.pandas.styles module

Utilities for styling pandas dataframes

timelink.pandas.styles.category_palette(categories, cmap_name=None)[source]

Create a color palette associated with a list of categories

Parameters:

Categories – List of categories
cmap_name – matplotlib color map defaults to Pastel2 see https://matplotlib.org/stable/tutorials/colors/colormaps.html

Returns a dict with the categories as keys and colors as values

timelink.pandas.styles.style_color_row_by_category(row, palette, category='id')[source]

Color row by category. Function for styling dataframes Usage: display(df.style.apply(style_color_row_by_category,axis=1,palette=mypalette))

Args: row: this is passed by pandas when rendering the dataframe palette: a dict that maps category values to colors category: column that determines the row color

timelink.pandas.styles.styler_row_colors(df, category='id', columns=None, palette=None, cmap_name=None)[source]

returns a dataframe setting the row color according to a category

Parameters:

df – dataframe
category – name of column with category that determines color, defaults to id
use (cmap_name; name of matplolib color map to)
'Pastel2' (defaults to)