Impacts estimation package

Module contents

Environmental impact estimation for Open Food Facts products

class impacts_estimation.impacts_estimation.ImpactEstimator(product, quantity=100, ignore_unknown_ingredients=True, use_defined_prct=True)[source]

Bases: object

_check_defined_percentages()[source]

Assert that the percentages that might be defined for some ingredients are valid.

_check_fermented_product()[source]

Checks if the product is fermented (alcohol or cheese for example). In that case, the carbohydrates should not be taken into account as the carbohydrates input of the ingredients may not be the same than the output in the product.

_check_ingredients()[source]

Performs some checks on multilevel ingredients.

_check_nutri_well_informed()[source]
Checks if the informations are well informed (sum nutri > 50g) and category is ‘coffee’ or ‘pepper’.

In that case, the nutrients should not be taken into account as the nutrients input of the product may not be well informed. We force the ‘nutriments_100g’ to be equal to the ingredients_data.

_check_product_water_loss()[source]

Some products (cheeses or butters for example) may have a bigger water loss than other. If the product is in a category with a high water loss potential, the maximum evaporation parameter will be automatically adjusted.

_remove_allergens(compound_ingredient=None)[source]

Removes allergens of the ingredient tree to avoid them to be considered as subingredients.

_remove_double_parenthesis(compound_ingredient=None)[source]

If an ingredient is on the form ingredient(ingredient1(ingredient2)) so we think ingredient1 is an information of ingredient and ingredient 1 is ignored

check_butters_product()[source]

Checks if the product is a butter. In that case, just the fat is be taken into account.

estimate_impacts(impact_names, min_run_nb=30, max_run_nb=1000, forced_run_nb=None, confidence_interval_width=0.05, confidence_level=0.95, use_nutritional_info=True, const_relax_coef=0, maximum_evaporation=0.4, total_mass_used=None, min_prct_dist_size=30, dual_gap_type='absolute', dual_gap_limit=0.001, solver_time_limit=60, time_limit_dual_gap_limit=0.01, confidence_weighting=True, use_ingredients_impact_uncertainty=True, quantiles_points=('0.05', '0.25', '0.5', '0.75', '0.95'), distributions_as_result=False, confidence_score_weighting_factor=10)[source]

Looping by calculating a new random recipe at each loop and stopping when the geometric mean of recipes impacts values are stabilized within a given confidence interval.

The convergence of the values is detected when the arithmetic mean of the log of the impact of the n-th first recipes has a normal distribution with a small enough confidence interval. Then the exponential of the values is taken to switch back to linear space and obtain the geometric mean of the impacts (geometric mean is the arithmetic mean of the log of the values).

Parameters
  • impact_names (str or list) – Iterable containing impacts names or single impact name.

  • min_run_nb (int) – Minimum number of run for the Monte-Carlo loop A too small number may result in a falsely converging value

  • max_run_nb (int) – Maximum number of run for the Monte-Carlo loop

  • forced_run_nb (int) – Used to bypass natural Monte-Carlo stopping criteria and force the number of runs

  • confidence_interval_width (float) – Width of the confidence interval that will determine the convergence detection.

  • confidence_level (float) – Confidence level of the confidence interval.

  • use_nutritional_info (bool) – Should nutritional information be used to estimate recipe?

  • const_relax_coef (float) – Constraints relaxation coefficient. Allows to relax constraints on nutriments, water and mass balance to increase chances to get a result.

  • maximum_evaporation (float) – Upper bound of the evaporation coefficient [0-1[. I.e. maximum proportion of ingredients water that can evaporate.

  • total_mass_used (float) – Total mass of ingredient used in grams, if known.

  • min_prct_dist_size (int) – Minimum size of the ingredients percentage distribution that will be used to pick a proportion for an ingredient. If the distribution (adjusted to the possible value interval) has less data, uniform distribution will be used instead.

  • dual_gap_type (str) – ‘absolute’ or ‘relative’. Determines the precision type of the variable optimization by the solver.

  • dual_gap_limit (float) – Determines the precision of the variable optimization by the solver. Relative or absolute according to dual_gap_type.

  • solver_time_limit (float) – Maximum time for the solver optimization (in seconds). Set to None or 0 to set no limit.

  • time_limit_dual_gap_limit (float) – Accepted precision of the solver in case of time limit hit. Relative or absolute according to dual_gap_type.

  • use_ingredients_impact_uncertainty (bool) – Should ingredients impacts uncertainty data be used?

  • confidence_weighting (bool) – Should the recipes be weighted by their confidence score (deviation of the recipes nutritional composition to the reference product).

  • quantiles_points (iterable) – List of impacts quantiles cutting points to return in the result.

  • distributions_as_result (bool) – Should the recipes, the distributions of the impact, the mean confidence interval and the confidence score be added to the result?

  • confidence_score_weighting_factor (float) –

    Weighting factor used for the confidence score calculation. It corresponds to the weight of the nutritional distance against the absolute difference between the

    total mass and 100g/100g.

Returns

Dictionary containing the result (the average impacts of all computed recipes) as well as other attributes such as the standard deviation of the impacts of all computed recipes, the list of unknown ingredients contained in the product, the average mass percentage of unknown ingredients.

Return type

dict

reliability_score(const_relax_coef, uncharacterized_ingredients_mass_proportion)[source]
Reliability level of the result:
  • 1: Absolutely reliable, no indication of a potential issue in the input data nor in the result

  • 2: Less than 5% of the product ingredients are not in the OFF ingredients taxonomy and less than 5% of

    the estimated mass of the product is composed of ingredients that are not characterized nutritionally or environmentally and the constraints may have been relaxed by less than 0.05% in order to get a result.

  • 3: Between 5% and 25% of the product ingredients are not in the OFF ingredients taxonomy and between

    5% and 25% of the estimated mass of the product is composed of ingredients that are not characterized nutritionally or environmentally and the constraints may have been relaxed by less than 0.05% in order to get a result.

  • 4: More than 25% of the ingredients are not in the OFF ingredients taxonomy or more than 25% of the

    estimated mass of the product is composed of ingredients that are not characterized nutritionally or environmentally, or the constraints has been relaxed by more than 0.05% in order to get a result or the is an important result warning.

class impacts_estimation.impacts_estimation.RandomRecipeCreator(product, use_defined_prct=True, use_nutritional_info=True, const_relax_coef=0, maximum_evaporation=0.4, total_mass_used=None, min_prct_dist_size=30, dual_gap_type='absolute', dual_gap_limit=0.001, solver_time_limit=60, time_limit_dual_gap_limit=0.01, allow_unbalanced_recipe=False, confidence_score_weighting_factor=10)[source]

Bases: object

_add_defined_percentage_constraints(product)[source]

Recursive function to add the constraints corresponding to the defined (sub)ingredients percentages. For top-level ingredients, defined percentages correspond to the percentage of the total mass of ingredients used before processing. For subingredients, the percentage corresponds either to the percentage of the parent ingredient or to the percentage of the product. This is determined by a preprocessing step made by ImpactEstimator._check_multilevel_ingredients. In cases where the percentage type is undefined, it is ignored.

Parameters

product (dict) – Dict corresponding to a product or a compound ingredient.

_add_evaporation_constraint()[source]
The product mass is bounded by:
  • Lower bound: The sum of the ingredients masses used multiplied by 1 minus the water they lost

    (evaporation coefficient multiplied by water content of the ingredient). Ingredients with unknown water content are supposed to have a water content of 1 for lower bound.

  • Upper bound: The sum of the ingredients masses used multiplied by 1 minus the water they lost

    (evaporation coefficient multiplied by water content of the ingredient) for ingredients with a known water content, plus the sum of ingredients with unknown water content masses (as they water content is supposed to be 0 for upper bound)

_add_mass_order_constraints(product)[source]

Recursive function to add the constraint that each (sub)ingredient must be in higher proportion than the next one of the same level.

Parameters

product (dict) – Dict corresponding to a product or a compound ingredient.

_add_nutritional_constraints()[source]

Looping on all nutriments to add the constraint that the sum of the ingredients proportions weighted by their content in this nutriment must fit the nutritional content of the product.

_add_product_mass_constraint()[source]

The product mass is bounded by the sum of all nutriments and the remaining water

_add_total_leaves_percentage_constraint()[source]

The sum of the percentages of all leaf ingredients must be 100%.

_add_total_subingredients_percentages_constraint(ingredient)[source]

Recursive function to add for each compound ingredient the constraint that its percentage must equal the sum of the percentages of its subingredients.

Parameters

ingredient (dict) – Dict corresponding to a compound ingredient.

_add_used_mass_constraint()[source]

Adding the constraint that the total used mass of ingredients is bounded by the evaporation coefficient.

_get_variable_bounds(variable)[source]

Use the solver to find the ingredient’s lower and upper bound.

Parameters

variable (Variable) – Solver variable

Returns

Tuple containing ingredient lower and upper bounds

Return type

tuple

_optimize_variable(variable, direction='minimize')[source]

Optimize the model and return the variable value.

Parameters
  • variable (Variable) – Variable to optimize

  • direction (str) – ‘minimize’ or ‘maximize’

Returns

Value of the optimized variable.

Return type

float

_pick_proportion(ingredient_name, inf, sup)[source]

Chooses a random proportion for this ingredient.

Uses a reference percentage distribution if the distribution of this ingredient in this interval has enough data, else uses an uniform distribution.

Parameters
  • ingredient_name (str) –

  • inf (float) – Lower bound

  • sup (float) – Upper bound

Returns

Proportion of this ingredient

Return type

float

_pick_total_mass(proportions, use_nutritional_info)[source]

Choosing the total mass of ingredients used by maximizing the confidence score of the resulting recipe.

Parameters

proportions (dict) – Proportions of the ingredients.

Returns

Total mass of ingredients used in g.

Return type

float

_remove_decreasing_order_constraint_from_rank(rank)[source]

Removes the decreasing proportion order constraint for all ingredients from the given rank. If an ingredient is below a certain proportion (2% in EU regulation), it may not be indicated in decreasing proportion order.

Parameters

rank (int) – Rank of the ingredient from which the decreasing proportion order constraint shall be replaced by a maximum proportion constraint.

random_recipe(use_nutritional_info=True)[source]

Create a possible recipe of a product given its ingredient list and nutritional data. The recipe is given for 100g of final product.

Notes

The recipe of the product is estimated randomly. To do this, a linear programming solver is defined with these constraints:

  • the sum of all ingredients percentage must be 100;

  • the ingredients percentages are given in decreasing order;

  • the nutritional composition of the product is the sum of the nutritional composition of its

    ingredients (with an error margin specified by nutritional_info_precision);

  • some ingredients may have a defined percentage.

Once the solver has been set, the algorithm loops through each ingredient in random order, and computes its possible values interval using the solver. Once the possibles values interval defined, it chooses a random proportion value for this ingredient within this interval and adds this value as a new constraint for the solver. If the ingredient has a reference percentage distribution (computed from existing OFF data), the random value will be picked following this distribution. If not, it will use a uniform distribution. Once all ingredients proportions have been defined, the same operation is done on the total mass used variable, by maximizing the confidence score of the resulting recipe.

Returns

Dictionary containing a possible recipe with ingredients ids as keys and masses in g as values.

Return type

dict

static recipe_from_proportions(proportions, total_mass)[source]

Returns a recipe from ingredients proportions and a total mass. Sums masses of ingredients used multiple times.

Parameters
  • proportions (dict) –

  • total_mass (float) –

Returns

Return type

dict

Examples

>>> RandomRecipeCreator.recipe_from_proportions({'en:egg':0.7, 'en:flour': 0.3}, 150)
{'en:flour': 45.0, 'en:egg':105.0}
class impacts_estimation.impacts_estimation.RecipeImpactCalculator(recipe, impact_name, use_uncertainty=False)[source]

Bases: object

_compute_impact()[source]
_compute_impact_shares()[source]
_define_ingredients_impacts()[source]

Getting the impact of each ingredient. If the ingredient has no uncertainty parameters or use_uncertainty is set to False, simply use the default value. Else pick a value using the uncertainty parameters.

get_ingredient_impact_share(ingredient)[source]

Returns the share of the recipe impact that is due to the given ingredient

get_recipe_impact()[source]

Calculate the environmental impact from a product recipe.

Warning

Any ingredients whose impact is unknown will be considered to have the average impact of the product.

Returns

Impact of the product

Return type

float

impacts_estimation.impacts_estimation.estimate_impacts(product, impact_names, quantity=100, ignore_unknown_ingredients=True, min_run_nb=30, max_run_nb=1000, forced_run_nb=None, confidence_interval_width=0.05, confidence_level=0.95, use_nutritional_info=True, const_relax_coef=0, use_defined_prct=True, maximum_evaporation=0.4, total_mass_used=None, min_prct_dist_size=30, dual_gap_type='absolute', dual_gap_limit=0.001, solver_time_limit=60, time_limit_dual_gap_limit=0.01, confidence_weighting=True, use_ingredients_impact_uncertainty=True, quantiles_points=('0.05', '0.25', '0.5', '0.75', '0.95'), distributions_as_result=False, confidence_score_weighting_factor=10, safe_mode=True)[source]

Wrapper for impact estimation.

Parameters
  • product (dict) – Dict containing an OpenFoodFact product. It must contain the keys “ingredients” and “nutriments”

  • impact_names (str or list) – Iterable containing impacts names or single impact name.

  • quantity (float) – Quantity of product in grams for which the impact must be calculated. Default is 100g.

  • ignore_unknown_ingredients (bool) – Should ingredients absent of OFF taxonomy and without defined percentage be considered as parsing errors and ignored?

  • min_run_nb (int) – Minimum number of run for the Monte-Carlo loop A too small number may result in a falsely converging value

  • max_run_nb (int) – Maximum number of run for the Monte-Carlo loop

  • forced_run_nb (int) – Used to bypass natural Monte-Carlo stopping criteria and force the number of runs

  • confidence_interval_width (float) – Width of the confidence interval that will determine the convergence detection.

  • confidence_level (float) – Confidence level of the confidence interval.

  • use_nutritional_info (bool) – Should nutritional information be used to estimate recipe?

  • const_relax_coef (float) – Constraints relaxation coefficient. Allows to relax constraints on nutriments, water and mass balance to increase chances to get a result.

  • use_defined_prct (bool) – Should ingredients percentages defined in the product be used?

  • maximum_evaporation (float) – Upper bound of the evaporation coefficient [0-1[. I.e. maximum proportion of ingredients water that can evaporate.

  • total_mass_used (float) – Total mass of ingredient used in grams, if known.

  • min_prct_dist_size (int) – Minimum size of the ingredients percentage distribution that will be used to pick a proportion for an ingredient. If the distribution (adjusted to the possible value interval) has less data, uniform distribution will be used instead.

  • dual_gap_type (str) – ‘absolute’ or ‘relative’. Determines the precision type of the variable optimization by the solver.

  • dual_gap_limit (float) – Determines the precision of the variable optimization by the solver. Relative or absolute according to dual_gap_type.

  • solver_time_limit (float) – Maximum time for the solver optimization (in seconds). Set to None or 0 to set no limit.

  • time_limit_dual_gap_limit (float) – Accepted precision of the solver in case of time limit hit. Relative or absolute according to dual_gap_type.

  • confidence_weighting (bool) – Should the recipes be weighted by their confidence score (deviation of the recipes nutritional composition to the reference product).

  • use_ingredients_impact_uncertainty (bool) – Should ingredients impacts uncertainty data be used?

  • quantiles_points (iterable) – List of impacts quantiles cutting points to return in the result.

  • distributions_as_result (bool) – Should the recipes, the distributions of the impact, the mean confidence interval and the confidence score be added to the result?

  • confidence_score_weighting_factor (float) –

    Weighting factor used for the confidence score calculation. It corresponds to the weight of the nutritional distance against the absolute difference between the

    total mass and 100g/100g.

  • safe_mode (bool) – If set to True, the constraints will be progressively relaxed in order to get a result.

impacts_estimation.impacts_estimation.estimate_impacts_safe(product, impact_names, **kwargs)[source]
impacts_estimation.impacts_estimation.impact_from_recipe(recipe, impact_name, use_uncertainty=False)[source]

Wrapper for RecipeImpactCalculator

Functions used by the environmental impact estimation program

class impacts_estimation.utils.UnknownIngredientsRemover[source]

Bases: object

remove_unknown_ingredients(product)[source]

Recursive function to remove ingredients if they are not in the OFF taxonomy or if they do not have a defined percentage or valid subingredients.

impacts_estimation.utils.agribalyse_impact_name_i18n(impact_name)[source]

Returns the French version of an impact name

Parameters

impact_name (str) –

Examples

>>> agribalyse_impact_name_i18n('Climate change')
'Changement climatique'
>>> agribalyse_impact_name_i18n("Appauvrissement de la couche d'ozone")
'Appauvrissement de la couche d'ozone'
impacts_estimation.utils.clear_ingredient_graph(product)[source]

Recursive function to search the ingredients graph and remove subingredients if all subingredients of a same ingredient are uncharacterized

Parameters

product (dict) – Dict corresponding to a product or a compound ingredient.

impacts_estimation.utils.confidence_score(nutri, reference_nutri, total_mass, min_possible_mass, max_possible_mass, weighting_factor=10, reference_mass=100)[source]

Calculate the confidence score of a nutritional composition using the euclidean distance between the reference nutritional composition and the assessed nutritional composition in the space of all considered nutriments contents and the total mass of ingredients used. The closer the nutritional composition is from the reference, the higher the confidence score is. The nearest of 100g/100g the total mass of ingredients is, the higher the confidence

score is.

The score is defined as the inverse of the sum of the nutritional distance and the absolute difference between the total mass and 100g/100g weighted by a weighting factor.

Parameters
  • nutri (dict) – Nutritional composition to evaluate.

  • reference_nutri (dict) – Nutritional composition of the reference product.

  • total_mass (float) – Total mass of ingredients used in g.

  • min_possible_mass (float) – Minimum possible total ingredient mass for a product in g

  • max_possible_mass (float) – Maximum possible total ingredient mass for a product in g

  • weighting_factor (float) – Weight of the nutritional distance against the absolute difference between the total mass and 100g/100g.

  • reference_mass (float) – Mass for which the nutritional compositions are expressed (in g).

Returns

Confidence score

Return type

float

impacts_estimation.utils.define_subingredients_percentage_type(product)[source]

Recursive function to search the ingredients graph and define if the subingredients percentages are defined as percentage of their parent ingredient or the whole product.

Parameters

product (dict) – Dict corresponding to a product or a compound ingredient.

impacts_estimation.utils.find_ingredients_graph_leaves(product)[source]

Recursive function to search the ingredients graph and find its leaves.

Parameters

product (dict) – Dict corresponding to a product or a compound ingredient.

Returns

List containing the ingredients graph leaves.

Return type

list

impacts_estimation.utils.flat_ingredients_list_BFS(product)[source]

Recursive function to search the ingredients graph by doing a Breadth First Search and return it as a flat list of all nodes. Sub ingredients are placed at the end of the list.

Parameters

product (dict) – Dict corresponding to a product or a compound ingredient.

Returns

List containing all the ingredients graph nodes.

Return type

list

impacts_estimation.utils.flat_ingredients_list_DFS(product)[source]

Recursive function to search the ingredients graph by doing a Depth First Search and return it as a flat list of all nodes. Sub ingredients are placed right after their parents.

Parameters

product (dict) – Dict corresponding to a product or a compound ingredient.

Returns

List containing all the ingredients graph nodes.

Return type

list

impacts_estimation.utils.individualize_ingredients(product, previous_ingredients_ids=None)[source]

Process an ingredient list in place to ensure that they all have a different id.

Parameters
  • product (dict) – Dict corresponding to a product, containing a list of ingredients, may contain compound ingredients

  • previous_ingredients_ids (list) – List containing ingredients ids. Needed only for recursive call

Examples

>>> product = {'ingredients': [{'id': 'A'}, {'id': 'B', 'ingredients': [{'id': 'A'}]}, {'id': 'B'}]}
>>> individualize_ingredients(product)
>>> print(product)
{'ingredients': [{'id': 'A'}, {'id': 'B', 'ingredients': [{'id': 'A*'}]}, {'id': 'B*'}]}
impacts_estimation.utils.maximum_percentage_sum(ingredients)[source]

Computes the maximum sum of ingredients percentages for ingredients given in decreasing percentage order, even if some ingredients does not have a percentage.

Notes

This is useful to estimate if subingredients percentages are defined in percentage of their parent ingredient or in percentage of the total product.

Parameters

ingredients (list) – List of dicts corresponding to the ingredients

Returns

Maximum value of the sum of all ingredients percentages.

Return type

float

impacts_estimation.utils.minimum_percentage_sum(ingredients)[source]

Computes the minimum sum of ingredients percentages for ingredients given in decreasing percentage order, even if some ingredients does not have a percentage.

Notes

This is useful to estimate if subingredients percentages are defined in percentage of their parent ingredient or in percentage of the total product.

Parameters

ingredients (list) – List of dicts corresponding to the ingredients

Returns

Minimum value of the sum of all ingredients percentages.

Return type

float

impacts_estimation.utils.natural_bounds(rank, nb_ingredients)[source]

Computes the upper and lower bounds of the proportion of an ingredient depending on its rank and the number of ingredients in the product given that they are in decreasing proportion order.

Examples

>>> natural_bounds(2, 4)
(0.0, 50.0)
>>> natural_bounds(1, 5)
(20.0, 100.0)
Parameters
  • rank (int) – Rank of the ingredient in the list

  • nb_ingredients (int) – Number of ingredients in the product

Returns

Lower and upper bounds of the proportion of the ingredient

Return type

tuple

impacts_estimation.utils.nutriments_from_recipe(recipe)[source]

Return the nutriments content of a product recipe by a weighted sum of the ingredients masses and reference nutriment contents.

Parameters

recipe (dict) – Dict containing ingredients as keys and masses in grams as values

Warning

Any ingredients whose nutriment content is unknown will be considered to have the average nutriment content of the product.

Returns

Dictionary with nutriments as keys and nutriment contents as values

Return type

dict

impacts_estimation.utils.nutritional_error_margin(nutriment, value)[source]

Returns the error margin of a product’s nutriment according to EU directives

Parameters
  • nutriment (str) – Nutriment considered

  • value (float) – Given product content of the considered nutriment

Returns

Dictionary containing absolute and relative margins (only one of which is different from 0)

Return type

dict

Examples

>>>nutritional_error_margin(‘proteins’, 0.05) {‘absolute’: 0.02, ‘relative’: 0} >>>nutritional_error_margin(‘proteins’, 0.3) {‘absolute’: 0, ‘relative’: 0.2}

impacts_estimation.utils.original_id(individualized_id)[source]

Gets the original id of an ingredient that has been transformed by individualize_ingredients()

Parameters

individualized_id (str) –

Returns

Return type

str

Examples

>>> original_id('en:water**')
'en:water'
>>> original_id('en:sugar')
'en:sugar'
impacts_estimation.utils.remove_percentage_from_product(product)[source]

Removes the defined percentage of ingredients.

Parameters

product (dict) –

impacts_estimation.utils.weighted_geometric_mean(values, weights)[source]

Returns the weighted geometric mean of values.

Parameters
  • values (iterable) –

  • weights (iterable) –

Returns

Return type

float

Exceptions used by the impact estimation program

exception impacts_estimation.exceptions.NoCharacterizedIngredientsError[source]

Bases: Exception

exception impacts_estimation.exceptions.NoKnownIngredientsError[source]

Bases: Exception

exception impacts_estimation.exceptions.RecipeCreationError[source]

Bases: Exception

exception impacts_estimation.exceptions.SolverTimeoutError[source]

Bases: Exception