Impacts estimation package
Module contents
Environmental impact estimation for Open Food Facts products
- class impacts_estimation.impacts_estimation.ImpactEstimator(product, quantity=100, ignore_unknown_ingredients=True, use_defined_prct=True)[source]
Bases:
object- _check_defined_percentages()[source]
Assert that the percentages that might be defined for some ingredients are valid.
- _check_fermented_product()[source]
Checks if the product is fermented (alcohol or cheese for example). In that case, the carbohydrates should not be taken into account as the carbohydrates input of the ingredients may not be the same than the output in the product.
- _check_nutri_well_informed()[source]
- Checks if the informations are well informed (sum nutri > 50g) and category is ‘coffee’ or ‘pepper’.
In that case, the nutrients should not be taken into account as the nutrients input of the product may not be well informed. We force the ‘nutriments_100g’ to be equal to the ingredients_data.
- _check_product_water_loss()[source]
Some products (cheeses or butters for example) may have a bigger water loss than other. If the product is in a category with a high water loss potential, the maximum evaporation parameter will be automatically adjusted.
- _remove_allergens(compound_ingredient=None)[source]
Removes allergens of the ingredient tree to avoid them to be considered as subingredients.
- _remove_double_parenthesis(compound_ingredient=None)[source]
If an ingredient is on the form ingredient(ingredient1(ingredient2)) so we think ingredient1 is an information of ingredient and ingredient 1 is ignored
- check_butters_product()[source]
Checks if the product is a butter. In that case, just the fat is be taken into account.
- estimate_impacts(impact_names, min_run_nb=30, max_run_nb=1000, forced_run_nb=None, confidence_interval_width=0.05, confidence_level=0.95, use_nutritional_info=True, const_relax_coef=0, maximum_evaporation=0.4, total_mass_used=None, min_prct_dist_size=30, dual_gap_type='absolute', dual_gap_limit=0.001, solver_time_limit=60, time_limit_dual_gap_limit=0.01, confidence_weighting=True, use_ingredients_impact_uncertainty=True, quantiles_points=('0.05', '0.25', '0.5', '0.75', '0.95'), distributions_as_result=False, confidence_score_weighting_factor=10)[source]
Looping by calculating a new random recipe at each loop and stopping when the geometric mean of recipes impacts values are stabilized within a given confidence interval.
The convergence of the values is detected when the arithmetic mean of the log of the impact of the n-th first recipes has a normal distribution with a small enough confidence interval. Then the exponential of the values is taken to switch back to linear space and obtain the geometric mean of the impacts (geometric mean is the arithmetic mean of the log of the values).
- Parameters
impact_names (str or list) – Iterable containing impacts names or single impact name.
min_run_nb (int) – Minimum number of run for the Monte-Carlo loop A too small number may result in a falsely converging value
max_run_nb (int) – Maximum number of run for the Monte-Carlo loop
forced_run_nb (int) – Used to bypass natural Monte-Carlo stopping criteria and force the number of runs
confidence_interval_width (float) – Width of the confidence interval that will determine the convergence detection.
confidence_level (float) – Confidence level of the confidence interval.
use_nutritional_info (bool) – Should nutritional information be used to estimate recipe?
const_relax_coef (float) – Constraints relaxation coefficient. Allows to relax constraints on nutriments, water and mass balance to increase chances to get a result.
maximum_evaporation (float) – Upper bound of the evaporation coefficient [0-1[. I.e. maximum proportion of ingredients water that can evaporate.
total_mass_used (float) – Total mass of ingredient used in grams, if known.
min_prct_dist_size (int) – Minimum size of the ingredients percentage distribution that will be used to pick a proportion for an ingredient. If the distribution (adjusted to the possible value interval) has less data, uniform distribution will be used instead.
dual_gap_type (str) – ‘absolute’ or ‘relative’. Determines the precision type of the variable optimization by the solver.
dual_gap_limit (float) – Determines the precision of the variable optimization by the solver. Relative or absolute according to dual_gap_type.
solver_time_limit (float) – Maximum time for the solver optimization (in seconds). Set to None or 0 to set no limit.
time_limit_dual_gap_limit (float) – Accepted precision of the solver in case of time limit hit. Relative or absolute according to dual_gap_type.
use_ingredients_impact_uncertainty (bool) – Should ingredients impacts uncertainty data be used?
confidence_weighting (bool) – Should the recipes be weighted by their confidence score (deviation of the recipes nutritional composition to the reference product).
quantiles_points (iterable) – List of impacts quantiles cutting points to return in the result.
distributions_as_result (bool) – Should the recipes, the distributions of the impact, the mean confidence interval and the confidence score be added to the result?
confidence_score_weighting_factor (float) –
Weighting factor used for the confidence score calculation. It corresponds to the weight of the nutritional distance against the absolute difference between the
total mass and 100g/100g.
- Returns
Dictionary containing the result (the average impacts of all computed recipes) as well as other attributes such as the standard deviation of the impacts of all computed recipes, the list of unknown ingredients contained in the product, the average mass percentage of unknown ingredients.
- Return type
dict
- reliability_score(const_relax_coef, uncharacterized_ingredients_mass_proportion)[source]
- Reliability level of the result:
1: Absolutely reliable, no indication of a potential issue in the input data nor in the result
- 2: Less than 5% of the product ingredients are not in the OFF ingredients taxonomy and less than 5% of
the estimated mass of the product is composed of ingredients that are not characterized nutritionally or environmentally and the constraints may have been relaxed by less than 0.05% in order to get a result.
- 3: Between 5% and 25% of the product ingredients are not in the OFF ingredients taxonomy and between
5% and 25% of the estimated mass of the product is composed of ingredients that are not characterized nutritionally or environmentally and the constraints may have been relaxed by less than 0.05% in order to get a result.
- 4: More than 25% of the ingredients are not in the OFF ingredients taxonomy or more than 25% of the
estimated mass of the product is composed of ingredients that are not characterized nutritionally or environmentally, or the constraints has been relaxed by more than 0.05% in order to get a result or the is an important result warning.
- class impacts_estimation.impacts_estimation.RandomRecipeCreator(product, use_defined_prct=True, use_nutritional_info=True, const_relax_coef=0, maximum_evaporation=0.4, total_mass_used=None, min_prct_dist_size=30, dual_gap_type='absolute', dual_gap_limit=0.001, solver_time_limit=60, time_limit_dual_gap_limit=0.01, allow_unbalanced_recipe=False, confidence_score_weighting_factor=10)[source]
Bases:
object- _add_defined_percentage_constraints(product)[source]
Recursive function to add the constraints corresponding to the defined (sub)ingredients percentages. For top-level ingredients, defined percentages correspond to the percentage of the total mass of ingredients used before processing. For subingredients, the percentage corresponds either to the percentage of the parent ingredient or to the percentage of the product. This is determined by a preprocessing step made by ImpactEstimator._check_multilevel_ingredients. In cases where the percentage type is undefined, it is ignored.
- Parameters
product (dict) – Dict corresponding to a product or a compound ingredient.
- _add_evaporation_constraint()[source]
- The product mass is bounded by:
- Lower bound: The sum of the ingredients masses used multiplied by 1 minus the water they lost
(evaporation coefficient multiplied by water content of the ingredient). Ingredients with unknown water content are supposed to have a water content of 1 for lower bound.
- Upper bound: The sum of the ingredients masses used multiplied by 1 minus the water they lost
(evaporation coefficient multiplied by water content of the ingredient) for ingredients with a known water content, plus the sum of ingredients with unknown water content masses (as they water content is supposed to be 0 for upper bound)
- _add_mass_order_constraints(product)[source]
Recursive function to add the constraint that each (sub)ingredient must be in higher proportion than the next one of the same level.
- Parameters
product (dict) – Dict corresponding to a product or a compound ingredient.
- _add_nutritional_constraints()[source]
Looping on all nutriments to add the constraint that the sum of the ingredients proportions weighted by their content in this nutriment must fit the nutritional content of the product.
- _add_product_mass_constraint()[source]
The product mass is bounded by the sum of all nutriments and the remaining water
- _add_total_leaves_percentage_constraint()[source]
The sum of the percentages of all leaf ingredients must be 100%.
- _add_total_subingredients_percentages_constraint(ingredient)[source]
Recursive function to add for each compound ingredient the constraint that its percentage must equal the sum of the percentages of its subingredients.
- Parameters
ingredient (dict) – Dict corresponding to a compound ingredient.
- _add_used_mass_constraint()[source]
Adding the constraint that the total used mass of ingredients is bounded by the evaporation coefficient.
- _get_variable_bounds(variable)[source]
Use the solver to find the ingredient’s lower and upper bound.
- Parameters
variable (Variable) – Solver variable
- Returns
Tuple containing ingredient lower and upper bounds
- Return type
tuple
- _optimize_variable(variable, direction='minimize')[source]
Optimize the model and return the variable value.
- Parameters
variable (Variable) – Variable to optimize
direction (str) – ‘minimize’ or ‘maximize’
- Returns
Value of the optimized variable.
- Return type
float
- _pick_proportion(ingredient_name, inf, sup)[source]
Chooses a random proportion for this ingredient.
Uses a reference percentage distribution if the distribution of this ingredient in this interval has enough data, else uses an uniform distribution.
- Parameters
ingredient_name (str) –
inf (float) – Lower bound
sup (float) – Upper bound
- Returns
Proportion of this ingredient
- Return type
float
- _pick_total_mass(proportions, use_nutritional_info)[source]
Choosing the total mass of ingredients used by maximizing the confidence score of the resulting recipe.
- Parameters
proportions (dict) – Proportions of the ingredients.
- Returns
Total mass of ingredients used in g.
- Return type
float
- _remove_decreasing_order_constraint_from_rank(rank)[source]
Removes the decreasing proportion order constraint for all ingredients from the given rank. If an ingredient is below a certain proportion (2% in EU regulation), it may not be indicated in decreasing proportion order.
- Parameters
rank (int) – Rank of the ingredient from which the decreasing proportion order constraint shall be replaced by a maximum proportion constraint.
- random_recipe(use_nutritional_info=True)[source]
Create a possible recipe of a product given its ingredient list and nutritional data. The recipe is given for 100g of final product.
Notes
The recipe of the product is estimated randomly. To do this, a linear programming solver is defined with these constraints:
the sum of all ingredients percentage must be 100;
the ingredients percentages are given in decreasing order;
- the nutritional composition of the product is the sum of the nutritional composition of its
ingredients (with an error margin specified by nutritional_info_precision);
some ingredients may have a defined percentage.
Once the solver has been set, the algorithm loops through each ingredient in random order, and computes its possible values interval using the solver. Once the possibles values interval defined, it chooses a random proportion value for this ingredient within this interval and adds this value as a new constraint for the solver. If the ingredient has a reference percentage distribution (computed from existing OFF data), the random value will be picked following this distribution. If not, it will use a uniform distribution. Once all ingredients proportions have been defined, the same operation is done on the total mass used variable, by maximizing the confidence score of the resulting recipe.
- Returns
Dictionary containing a possible recipe with ingredients ids as keys and masses in g as values.
- Return type
dict
- static recipe_from_proportions(proportions, total_mass)[source]
Returns a recipe from ingredients proportions and a total mass. Sums masses of ingredients used multiple times.
- Parameters
proportions (dict) –
total_mass (float) –
- Returns
- Return type
dict
Examples
>>> RandomRecipeCreator.recipe_from_proportions({'en:egg':0.7, 'en:flour': 0.3}, 150) {'en:flour': 45.0, 'en:egg':105.0}
- class impacts_estimation.impacts_estimation.RecipeImpactCalculator(recipe, impact_name, use_uncertainty=False)[source]
Bases:
object- _define_ingredients_impacts()[source]
Getting the impact of each ingredient. If the ingredient has no uncertainty parameters or use_uncertainty is set to False, simply use the default value. Else pick a value using the uncertainty parameters.
Returns the share of the recipe impact that is due to the given ingredient
- impacts_estimation.impacts_estimation.estimate_impacts(product, impact_names, quantity=100, ignore_unknown_ingredients=True, min_run_nb=30, max_run_nb=1000, forced_run_nb=None, confidence_interval_width=0.05, confidence_level=0.95, use_nutritional_info=True, const_relax_coef=0, use_defined_prct=True, maximum_evaporation=0.4, total_mass_used=None, min_prct_dist_size=30, dual_gap_type='absolute', dual_gap_limit=0.001, solver_time_limit=60, time_limit_dual_gap_limit=0.01, confidence_weighting=True, use_ingredients_impact_uncertainty=True, quantiles_points=('0.05', '0.25', '0.5', '0.75', '0.95'), distributions_as_result=False, confidence_score_weighting_factor=10, safe_mode=True)[source]
Wrapper for impact estimation.
- Parameters
product (dict) – Dict containing an OpenFoodFact product. It must contain the keys “ingredients” and “nutriments”
impact_names (str or list) – Iterable containing impacts names or single impact name.
quantity (float) – Quantity of product in grams for which the impact must be calculated. Default is 100g.
ignore_unknown_ingredients (bool) – Should ingredients absent of OFF taxonomy and without defined percentage be considered as parsing errors and ignored?
min_run_nb (int) – Minimum number of run for the Monte-Carlo loop A too small number may result in a falsely converging value
max_run_nb (int) – Maximum number of run for the Monte-Carlo loop
forced_run_nb (int) – Used to bypass natural Monte-Carlo stopping criteria and force the number of runs
confidence_interval_width (float) – Width of the confidence interval that will determine the convergence detection.
confidence_level (float) – Confidence level of the confidence interval.
use_nutritional_info (bool) – Should nutritional information be used to estimate recipe?
const_relax_coef (float) – Constraints relaxation coefficient. Allows to relax constraints on nutriments, water and mass balance to increase chances to get a result.
use_defined_prct (bool) – Should ingredients percentages defined in the product be used?
maximum_evaporation (float) – Upper bound of the evaporation coefficient [0-1[. I.e. maximum proportion of ingredients water that can evaporate.
total_mass_used (float) – Total mass of ingredient used in grams, if known.
min_prct_dist_size (int) – Minimum size of the ingredients percentage distribution that will be used to pick a proportion for an ingredient. If the distribution (adjusted to the possible value interval) has less data, uniform distribution will be used instead.
dual_gap_type (str) – ‘absolute’ or ‘relative’. Determines the precision type of the variable optimization by the solver.
dual_gap_limit (float) – Determines the precision of the variable optimization by the solver. Relative or absolute according to dual_gap_type.
solver_time_limit (float) – Maximum time for the solver optimization (in seconds). Set to None or 0 to set no limit.
time_limit_dual_gap_limit (float) – Accepted precision of the solver in case of time limit hit. Relative or absolute according to dual_gap_type.
confidence_weighting (bool) – Should the recipes be weighted by their confidence score (deviation of the recipes nutritional composition to the reference product).
use_ingredients_impact_uncertainty (bool) – Should ingredients impacts uncertainty data be used?
quantiles_points (iterable) – List of impacts quantiles cutting points to return in the result.
distributions_as_result (bool) – Should the recipes, the distributions of the impact, the mean confidence interval and the confidence score be added to the result?
confidence_score_weighting_factor (float) –
Weighting factor used for the confidence score calculation. It corresponds to the weight of the nutritional distance against the absolute difference between the
total mass and 100g/100g.
safe_mode (bool) – If set to True, the constraints will be progressively relaxed in order to get a result.
- impacts_estimation.impacts_estimation.estimate_impacts_safe(product, impact_names, **kwargs)[source]
- impacts_estimation.impacts_estimation.impact_from_recipe(recipe, impact_name, use_uncertainty=False)[source]
Wrapper for RecipeImpactCalculator
Functions used by the environmental impact estimation program
- impacts_estimation.utils.agribalyse_impact_name_i18n(impact_name)[source]
Returns the French version of an impact name
- Parameters
impact_name (str) –
Examples
>>> agribalyse_impact_name_i18n('Climate change') 'Changement climatique' >>> agribalyse_impact_name_i18n("Appauvrissement de la couche d'ozone") 'Appauvrissement de la couche d'ozone'
- impacts_estimation.utils.clear_ingredient_graph(product)[source]
Recursive function to search the ingredients graph and remove subingredients if all subingredients of a same ingredient are uncharacterized
- Parameters
product (dict) – Dict corresponding to a product or a compound ingredient.
- impacts_estimation.utils.confidence_score(nutri, reference_nutri, total_mass, min_possible_mass, max_possible_mass, weighting_factor=10, reference_mass=100)[source]
Calculate the confidence score of a nutritional composition using the euclidean distance between the reference nutritional composition and the assessed nutritional composition in the space of all considered nutriments contents and the total mass of ingredients used. The closer the nutritional composition is from the reference, the higher the confidence score is. The nearest of 100g/100g the total mass of ingredients is, the higher the confidence
score is.
The score is defined as the inverse of the sum of the nutritional distance and the absolute difference between the total mass and 100g/100g weighted by a weighting factor.
- Parameters
nutri (dict) – Nutritional composition to evaluate.
reference_nutri (dict) – Nutritional composition of the reference product.
total_mass (float) – Total mass of ingredients used in g.
min_possible_mass (float) – Minimum possible total ingredient mass for a product in g
max_possible_mass (float) – Maximum possible total ingredient mass for a product in g
weighting_factor (float) – Weight of the nutritional distance against the absolute difference between the total mass and 100g/100g.
reference_mass (float) – Mass for which the nutritional compositions are expressed (in g).
- Returns
Confidence score
- Return type
float
- impacts_estimation.utils.define_subingredients_percentage_type(product)[source]
Recursive function to search the ingredients graph and define if the subingredients percentages are defined as percentage of their parent ingredient or the whole product.
- Parameters
product (dict) – Dict corresponding to a product or a compound ingredient.
- impacts_estimation.utils.find_ingredients_graph_leaves(product)[source]
Recursive function to search the ingredients graph and find its leaves.
- Parameters
product (dict) – Dict corresponding to a product or a compound ingredient.
- Returns
List containing the ingredients graph leaves.
- Return type
list
- impacts_estimation.utils.flat_ingredients_list_BFS(product)[source]
Recursive function to search the ingredients graph by doing a Breadth First Search and return it as a flat list of all nodes. Sub ingredients are placed at the end of the list.
- Parameters
product (dict) – Dict corresponding to a product or a compound ingredient.
- Returns
List containing all the ingredients graph nodes.
- Return type
list
- impacts_estimation.utils.flat_ingredients_list_DFS(product)[source]
Recursive function to search the ingredients graph by doing a Depth First Search and return it as a flat list of all nodes. Sub ingredients are placed right after their parents.
- Parameters
product (dict) – Dict corresponding to a product or a compound ingredient.
- Returns
List containing all the ingredients graph nodes.
- Return type
list
- impacts_estimation.utils.individualize_ingredients(product, previous_ingredients_ids=None)[source]
Process an ingredient list in place to ensure that they all have a different id.
- Parameters
product (dict) – Dict corresponding to a product, containing a list of ingredients, may contain compound ingredients
previous_ingredients_ids (list) – List containing ingredients ids. Needed only for recursive call
Examples
>>> product = {'ingredients': [{'id': 'A'}, {'id': 'B', 'ingredients': [{'id': 'A'}]}, {'id': 'B'}]} >>> individualize_ingredients(product) >>> print(product) {'ingredients': [{'id': 'A'}, {'id': 'B', 'ingredients': [{'id': 'A*'}]}, {'id': 'B*'}]}
- impacts_estimation.utils.maximum_percentage_sum(ingredients)[source]
Computes the maximum sum of ingredients percentages for ingredients given in decreasing percentage order, even if some ingredients does not have a percentage.
Notes
This is useful to estimate if subingredients percentages are defined in percentage of their parent ingredient or in percentage of the total product.
- Parameters
ingredients (list) – List of dicts corresponding to the ingredients
- Returns
Maximum value of the sum of all ingredients percentages.
- Return type
float
- impacts_estimation.utils.minimum_percentage_sum(ingredients)[source]
Computes the minimum sum of ingredients percentages for ingredients given in decreasing percentage order, even if some ingredients does not have a percentage.
Notes
This is useful to estimate if subingredients percentages are defined in percentage of their parent ingredient or in percentage of the total product.
- Parameters
ingredients (list) – List of dicts corresponding to the ingredients
- Returns
Minimum value of the sum of all ingredients percentages.
- Return type
float
- impacts_estimation.utils.natural_bounds(rank, nb_ingredients)[source]
Computes the upper and lower bounds of the proportion of an ingredient depending on its rank and the number of ingredients in the product given that they are in decreasing proportion order.
Examples
>>> natural_bounds(2, 4) (0.0, 50.0) >>> natural_bounds(1, 5) (20.0, 100.0)
- Parameters
rank (int) – Rank of the ingredient in the list
nb_ingredients (int) – Number of ingredients in the product
- Returns
Lower and upper bounds of the proportion of the ingredient
- Return type
tuple
- impacts_estimation.utils.nutriments_from_recipe(recipe)[source]
Return the nutriments content of a product recipe by a weighted sum of the ingredients masses and reference nutriment contents.
- Parameters
recipe (dict) – Dict containing ingredients as keys and masses in grams as values
Warning
Any ingredients whose nutriment content is unknown will be considered to have the average nutriment content of the product.
- Returns
Dictionary with nutriments as keys and nutriment contents as values
- Return type
dict
- impacts_estimation.utils.nutritional_error_margin(nutriment, value)[source]
Returns the error margin of a product’s nutriment according to EU directives
- Parameters
nutriment (str) – Nutriment considered
value (float) – Given product content of the considered nutriment
- Returns
Dictionary containing absolute and relative margins (only one of which is different from 0)
- Return type
dict
Examples
>>>nutritional_error_margin(‘proteins’, 0.05) {‘absolute’: 0.02, ‘relative’: 0} >>>nutritional_error_margin(‘proteins’, 0.3) {‘absolute’: 0, ‘relative’: 0.2}
- impacts_estimation.utils.original_id(individualized_id)[source]
Gets the original id of an ingredient that has been transformed by individualize_ingredients()
- Parameters
individualized_id (str) –
- Returns
- Return type
str
Examples
>>> original_id('en:water**') 'en:water' >>> original_id('en:sugar') 'en:sugar'
- impacts_estimation.utils.remove_percentage_from_product(product)[source]
Removes the defined percentage of ingredients.
- Parameters
product (dict) –
- impacts_estimation.utils.weighted_geometric_mean(values, weights)[source]
Returns the weighted geometric mean of values.
- Parameters
values (iterable) –
weights (iterable) –
- Returns
- Return type
float
Exceptions used by the impact estimation program