timelink.pandas package

Pandas related utilities for Timelink In progress. Originally developped for the FAUC1537-1919 project. Now being integrated in the timelink-py package

Submodules

timelink.pandas.attribute_values module

Create a dataframe with the values of an attribute

timelink.pandas.attribute_values.attribute_values(the_type, attr_type=None, groupname=None, dates_between=None, db: TimelinkDatabase | None = None, session=None, sql_echo=False)[source]

Return the vocabulary of an attribute

The returned dataframe has a row for each unique value a ‘count’ with the number of different entities, and the the first and last date for that row

Parameters:
  • for (the_type = attribute type to search)

  • the_type (attr_type = alians for)

  • deprecated

  • use (session = database session to)

  • specified (either db or session must be)

  • by (groupname = groupname to filter)

  • use

  • use

  • db.session() (if None will use)

  • yyyy-mm-dd (dates_between = tuple with two dates in format)

  • statement (sql_echo = if true will print the sql)

To filter by dates: dates_in = (from_date,to_date) with dates in format yyyy-mm-dd will return attributes with from_date < date < to_date

timelink.pandas.entities_with_attribute module

timelink.pandas.entities_with_attribute.entities_with_attribute(the_type: str | List[str], the_value=None, column_name=None, entity_type='entity', show_elements=None, dates_in=None, name_like=None, filter_by=None, more_attributes=None, db: TimelinkDatabase | None = None, session: Session | None = None, sql_echo=False)[source]

Generate a pandas dataframe with entities with a given attribute

Parameters:
  • the_type – type of attribute, can have SQL wildcards, string, or list

  • the_value – if present, limit to this value, can be SQL wildcard,

  • entity_type – if present, limit to this entity type, string

  • column_name – if present, use this name for the attribute column, otherwise use the_type usefull when the_type is a list

  • name_like – if present, limit to this name, can have SQL wildcards

  • show_elements – List of entity elements to add to the dataframe

  • dates_in – (after,before) if present only between those dates (exclusive)

  • filter_by – list of ids, limit to these entities

  • more_attributes – add more attributes if available

  • db – A TimelinkDatabase object

  • session – A SQLAlchemy session, if None will use db.session()

  • sql_echo – if True echo the sql generated

Example

# name, sex and function of people living in the same place

neighbors = entities_with_attribute(

entity_type=”person”, show_elements=[“groupname”,”names”,”sex”], the_type=’residencia’, the_value=”soure” column_name=”local”, # use this istead of “residencia” more_attributes=[“profissao”] db=dbsystem, )

Ideas:
Add :

the_value_in: (list of values) the_value_between_inc (min, max, get >=min and <= max) the_value_between_exc (min, max, get >min and < max)

timelink.pandas.group_attributes module

Return the attributes of a group of entities in a dataframe.

timelink.pandas.group_attributes.display_group_attributes(ids, entity_type='entity', header_elements=None, header_attributes=None, sort_header=None, sort_attributes=None, category='id', cmap_name='tab20', include_attributes=None, exclude_attributes=None, db: TimelinkDatabase | None = None)[source]

Display attributes of a group with header and colored rows.

Same as group attributes but a header is displayed for each entity and each entity is colored. The attribute list is also colored, to make is clear which attributes are from which entity.

Parameters:
  • ids – list of ids

  • entity_type – type of entities to show

  • header_elements – elements of entity type to include in header (e.g. name, description, obs)

  • header_attributes – list of attribute types to include in header

  • sort_header – sort the header by this attribute

  • sort_attributes – sort the attributes by this attribute

  • include_attributes – list of attribute types to include

  • exclude_attributes – list of attribute types to exclude

  • db – a TimelinkDatabase object if None specify session

  • category – column to use for coloring

  • cmap_name – name of the colormap to use. See https://matplotlib.org/stable/tutorials/colors/colormaps.html

timelink.pandas.group_attributes.group_attributes(group: list, entity_type='entity', include_attributes=None, exclude_attributes=None, show_elements=None, db: TimelinkDatabase | None = None, session: Session | None = None, sql_echo=False)[source]

Return the attributes of a group of entities in a dataframe.

Args: group: list of ids entity_type: type of entities to show show_elements: elements of entity type to include

(e.g. name, description, obs)

include_attributes: list of attribute types to include exclude_attributes: list of attribute types to exclude db: a TimelinkDatabase object if None specify session session: a sqlalchemy session object, if None specify db sql_echo: if True echo the sql generated

timelink.pandas.name_to_df module

timelink.pandas.name_to_df.pname_to_df(name, db: TimelinkDatabase | None = None, session=None, similar=False, name_particles=None, sql_echo=False)[source]

pname_to_df return df of people with a matching name

Parameters:
  • name – name to search for

  • db – = database connection to use, either db or session must be specified

  • session – session to use, either db or session must be specified

  • similar – if true will strip particles and insert a wild card % between name components with an extra one at the end

  • name_particles – list, particles to remove before comparing names

timelink.pandas.styles module

Utilities for styling pandas dataframes

timelink.pandas.styles.category_palette(categories, cmap_name=None)[source]

Create a color palette associated with a list of categories

Parameters:

Returns a dict with the categories as keys and colors as values

timelink.pandas.styles.style_color_row_by_category(row, palette, category='id')[source]

Color row by category. Function for styling dataframes Usage: display(df.style.apply(style_color_row_by_category,axis=1,palette=mypalette))

Args

row: this is passed by pandas when rendering the dataframe palette: a dict that maps category values to colors category: column that determines the row color

timelink.pandas.styles.styler_row_colors(df, category='id', columns=None, palette=None, cmap_name=None)[source]

returns a dataframe setting the row color according to a category

Parameters:
  • df – dataframe

  • category – name of column with category that determines color, defaults to id

  • use (cmap_name; name of matplolib color map to)

  • 'Pastel2' (defaults to)