timelink.pandas package
Pandas related utilities for Timelink In progress. Originally developped for the FAUC1537-1919 project. Now being integrated in the timelink-py package
Submodules
timelink.pandas.attribute_values module
Create a dataframe with the values of an attribute
- timelink.pandas.attribute_values.attribute_values(the_type, attr_type=None, groupname=None, dates_between=None, db: TimelinkDatabase | None = None, session=None, sql_echo=False)[source]
Return the vocabulary of an attribute
The returned dataframe has a row for each unique value a ‘count’ with the number of different entities, and the the first and last date for that row
- Parameters:
for (the_type = attribute type to search)
the_type (attr_type = alians for)
deprecated
use (session = database session to)
specified (either db or session must be)
by (groupname = groupname to filter)
use
use
db.session() (if None will use)
yyyy-mm-dd (dates_between = tuple with two dates in format)
statement (sql_echo = if true will print the sql)
To filter by dates: dates_in = (from_date,to_date) with dates in format yyyy-mm-dd will return attributes with from_date < date < to_date
timelink.pandas.entities_with_attribute module
- timelink.pandas.entities_with_attribute.entities_with_attribute(the_type: str | List[str], the_value=None, column_name=None, entity_type='entity', show_elements=None, dates_in=None, name_like=None, filter_by=None, more_attributes=None, db: TimelinkDatabase | None = None, session: Session | None = None, sql_echo=False)[source]
Generate a pandas dataframe with entities with a given attribute
- Parameters:
the_type – type of attribute, can have SQL wildcards, string, or list
the_value – if present, limit to this value, can be SQL wildcard,
entity_type – if present, limit to this entity type, string
column_name – if present, use this name for the attribute column, otherwise use the_type usefull when the_type is a list
name_like – if present, limit to this name, can have SQL wildcards
show_elements – List of entity elements to add to the dataframe
dates_in – (after,before) if present only between those dates (exclusive)
filter_by – list of ids, limit to these entities
more_attributes – add more attributes if available
db – A TimelinkDatabase object
session – A SQLAlchemy session, if None will use db.session()
sql_echo – if True echo the sql generated
Example
# name, sex and function of people living in the same place
- neighbors = entities_with_attribute(
entity_type=”person”, show_elements=[“groupname”,”names”,”sex”], the_type=’residencia’, the_value=”soure” column_name=”local”, # use this istead of “residencia” more_attributes=[“profissao”] db=dbsystem, )
- Ideas:
- Add :
the_value_in: (list of values) the_value_between_inc (min, max, get >=min and <= max) the_value_between_exc (min, max, get >min and < max)
timelink.pandas.group_attributes module
Return the attributes of a group of entities in a dataframe.
- timelink.pandas.group_attributes.display_group_attributes(ids, entity_type='entity', header_elements=None, header_attributes=None, sort_header=None, sort_attributes=None, category='id', cmap_name='tab20', include_attributes=None, exclude_attributes=None, db: TimelinkDatabase | None = None)[source]
Display attributes of a group with header and colored rows.
Same as group attributes but a header is displayed for each entity and each entity is colored. The attribute list is also colored, to make is clear which attributes are from which entity.
- Parameters:
ids – list of ids
entity_type – type of entities to show
header_elements – elements of entity type to include in header (e.g. name, description, obs)
header_attributes – list of attribute types to include in header
sort_header – sort the header by this attribute
sort_attributes – sort the attributes by this attribute
include_attributes – list of attribute types to include
exclude_attributes – list of attribute types to exclude
db – a TimelinkDatabase object if None specify session
category – column to use for coloring
cmap_name – name of the colormap to use. See https://matplotlib.org/stable/tutorials/colors/colormaps.html
- timelink.pandas.group_attributes.group_attributes(group: list, entity_type='entity', include_attributes=None, exclude_attributes=None, show_elements=None, db: TimelinkDatabase | None = None, session: Session | None = None, sql_echo=False)[source]
Return the attributes of a group of entities in a dataframe.
Args: group: list of ids entity_type: type of entities to show show_elements: elements of entity type to include
(e.g. name, description, obs)
include_attributes: list of attribute types to include exclude_attributes: list of attribute types to exclude db: a TimelinkDatabase object if None specify session session: a sqlalchemy session object, if None specify db sql_echo: if True echo the sql generated
timelink.pandas.name_to_df module
- timelink.pandas.name_to_df.pname_to_df(name, db: TimelinkDatabase | None = None, session=None, similar=False, name_particles=None, sql_echo=False)[source]
pname_to_df return df of people with a matching name
- Parameters:
name – name to search for
db – = database connection to use, either db or session must be specified
session – session to use, either db or session must be specified
similar – if true will strip particles and insert a wild card % between name components with an extra one at the end
name_particles – list, particles to remove before comparing names
timelink.pandas.styles module
Utilities for styling pandas dataframes
- timelink.pandas.styles.category_palette(categories, cmap_name=None)[source]
Create a color palette associated with a list of categories
- Parameters:
Categories – List of categories
cmap_name – matplotlib color map defaults to Pastel2 see https://matplotlib.org/stable/tutorials/colors/colormaps.html
Returns a dict with the categories as keys and colors as values
- timelink.pandas.styles.style_color_row_by_category(row, palette, category='id')[source]
Color row by category. Function for styling dataframes Usage: display(df.style.apply(style_color_row_by_category,axis=1,palette=mypalette))
- Args
row: this is passed by pandas when rendering the dataframe palette: a dict that maps category values to colors category: column that determines the row color
- timelink.pandas.styles.styler_row_colors(df, category='id', columns=None, palette=None, cmap_name=None)[source]
returns a dataframe setting the row color according to a category
- Parameters:
df – dataframe
category – name of column with category that determines color, defaults to id
use (cmap_name; name of matplolib color map to)
'Pastel2' (defaults to)