{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# Creating Test Problems\n", "```\n", "Copyright 2025 National Technology & Engineering Solutions of Sandia,\n", "LLC (NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the\n", "U.S. Government retains certain rights in this software.\n", "```" ] }, { "cell_type": "markdown", "id": "1", "metadata": {}, "source": [ "We demonstrate how to use the `create_problem` function to create test problems for decomposition algorithms. " ] }, { "cell_type": "code", "execution_count": null, "id": "2", "metadata": {}, "outputs": [], "source": [ "import pyttb as ttb\n", "from pyttb.create_problem import (\n", " CPProblem,\n", " ExistingCPSolution,\n", " TuckerProblem,\n", " MissingData,\n", " create_problem,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "3", "metadata": {}, "outputs": [], "source": [ "# Set global random seed for reproducibility of this notebook\n", "import numpy as np\n", "\n", "np.random.seed(123)" ] }, { "cell_type": "markdown", "id": "4", "metadata": {}, "source": [ "## Create a CP test problem\n", "The `create_problem` function generates both the solution (as a `ktensor` for CP) and the test data (as a dense `tensor`)." ] }, { "cell_type": "code", "execution_count": null, "id": "5", "metadata": {}, "outputs": [], "source": [ "# Create a problem\n", "cp_specific_params = CPProblem(shape=(5, 4, 3), num_factors=3, noise=0.1)\n", "no_missing_data = MissingData()\n", "solution, data = create_problem(cp_specific_params, no_missing_data)" ] }, { "cell_type": "code", "execution_count": null, "id": "6", "metadata": {}, "outputs": [], "source": [ "# Display the solution\n", "print(solution)" ] }, { "cell_type": "code", "execution_count": null, "id": "7", "metadata": {}, "outputs": [], "source": [ "# Display the data\n", "print(data)" ] }, { "cell_type": "code", "execution_count": null, "id": "8", "metadata": {}, "outputs": [], "source": [ "# The difference between the true solution and measured data\n", "# should match the specified noise setting\n", "diff = (solution.full() - data).norm() / solution.full().norm()\n", "print(diff)" ] }, { "cell_type": "markdown", "id": "9", "metadata": {}, "source": [ "## Creating a Tucker test problem\n", "The `create_problem` function can also create Tucker problems by providing a `TuckerParams` data class as the first argument to `create_problem` instead. In this case, the function generates the solution as a `ttensor`." ] }, { "cell_type": "code", "execution_count": null, "id": "10", "metadata": {}, "outputs": [], "source": [ "tucker_specific_params = TuckerProblem(\n", " shape=(5, 4, 3), num_factors=[3, 3, 2], noise=0.1\n", ")\n", "no_missing_data = MissingData()\n", "solution, data = create_problem(tucker_specific_params, no_missing_data)" ] }, { "cell_type": "code", "execution_count": null, "id": "11", "metadata": {}, "outputs": [], "source": [ "# Display the solution\n", "print(solution)" ] }, { "cell_type": "code", "execution_count": null, "id": "12", "metadata": {}, "outputs": [], "source": [ "# Display the data\n", "print(data)" ] }, { "cell_type": "code", "execution_count": null, "id": "13", "metadata": {}, "outputs": [], "source": [ "# The difference between the true solution and measured data\n", "# should match the specified noise setting\n", "diff = (solution.full() - data).norm() / solution.full().norm()\n", "print(diff)" ] }, { "cell_type": "markdown", "id": "14", "metadata": {}, "source": [ "## Recreating the same test problem\n", "We are still relying on numpy's deprecated global random state. See [#441](https://github.com/sandialabs/pyttb/issues/441)" ] }, { "cell_type": "code", "execution_count": null, "id": "15", "metadata": {}, "outputs": [], "source": [ "# Problem details\n", "shape = [5, 4, 3]\n", "num_factors = 3\n", "seed = 123\n", "missing_params = MissingData()\n", "cp_specific_params = CPProblem(shape, num_factors=num_factors)" ] }, { "cell_type": "code", "execution_count": null, "id": "16", "metadata": {}, "outputs": [], "source": [ "# Generate the first test problem\n", "np.random.seed(seed)\n", "solution_1, data_1 = create_problem(cp_specific_params, missing_params)" ] }, { "cell_type": "code", "execution_count": null, "id": "17", "metadata": {}, "outputs": [], "source": [ "# Generate the second test problem\n", "np.random.seed(seed)\n", "solution_2, data_2 = create_problem(cp_specific_params, missing_params)" ] }, { "cell_type": "code", "execution_count": null, "id": "18", "metadata": {}, "outputs": [], "source": [ "# Check that the solutions are identical\n", "print(f\"{solution_1.isequal(solution_2)=}\")\n", "\n", "# Check that the data are identical\n", "print(f\"{(data_1-data_2).norm()=}\")" ] }, { "cell_type": "markdown", "id": "19", "metadata": {}, "source": [ "## Options for creating factor matrices, core tensors, and weights\n", "\n", "User specified functions may be provided to generate the relevant components of `ktensors` or `ttensors`." ] }, { "cell_type": "code", "execution_count": null, "id": "20", "metadata": {}, "outputs": [], "source": [ "# Example custom weight generator for CP Problems\n", "cp_specific_params = CPProblem(shape=[5, 4, 3], num_factors=2, weight_generator=np.ones)\n", "solution, _ = create_problem(cp_specific_params, missing_params)\n", "print(f\"{solution.weights}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "21", "metadata": {}, "outputs": [], "source": [ "# Example custom core generator for Tucker\n", "tucker_specific_params = TuckerProblem(\n", " shape=[5, 4, 3], num_factors=[2, 2, 2], core_generator=ttb.tenones\n", ")\n", "solution, _ = create_problem(tucker_specific_params, missing_params)\n", "print(f\"{solution.core}\")" ] }, { "cell_type": "markdown", "id": "22", "metadata": {}, "source": [ "## Create dense missing data problems\n", "It's possible to create problems that have a percentage of missing data. The problem generator randomly creates the pattern of missing data." ] }, { "cell_type": "code", "execution_count": null, "id": "23", "metadata": {}, "outputs": [], "source": [ "# Specify 25% missing data\n", "missing_data_params = MissingData(missing_ratio=0.25)\n", "\n", "# Show an example of randomly generated pattern\n", "# 1 is known 0 is unknown\n", "print(missing_data_params.get_pattern(shape=[5, 4, 3]))" ] }, { "cell_type": "code", "execution_count": null, "id": "24", "metadata": {}, "outputs": [], "source": [ "# Generate problem using a newly sampled pattern\n", "solution, data = create_problem(cp_specific_params, missing_data_params)" ] }, { "cell_type": "code", "execution_count": null, "id": "25", "metadata": {}, "outputs": [], "source": [ "# Show data (including noise) with missing entries zeroed out\n", "print(data)" ] }, { "cell_type": "markdown", "id": "26", "metadata": {}, "source": [ "## Creating sparse missing data problems\n", "If `sparse_models` is set to true then the returned data is sparse. This should only be used with `missing_ratio` >= 0.8." ] }, { "cell_type": "code", "execution_count": null, "id": "27", "metadata": {}, "outputs": [], "source": [ "missing_data_params = MissingData(missing_ratio=0.8, sparse_model=True)\n", "\n", "# Here is a candidate pattern of known data\n", "print(missing_data_params.get_pattern([5, 4, 3]))" ] }, { "cell_type": "code", "execution_count": null, "id": "28", "metadata": {}, "outputs": [], "source": [ "# Here is the data (including noise) with zeros not explicitly represented.\n", "solution, data = create_problem(cp_specific_params, missing_data_params)\n", "print(data)" ] }, { "cell_type": "markdown", "id": "29", "metadata": {}, "source": [ "## Create missing data problems with pre-specified pattern\n", "A specific pattern (dense or sparse) can be use to represent missing data. This is also currently the recommended approach for reproducibility." ] }, { "cell_type": "code", "execution_count": null, "id": "30", "metadata": {}, "outputs": [], "source": [ "# Grab a pattern from before\n", "pattern = MissingData(missing_ratio=0.25).get_pattern([5, 4, 3])\n", "missing_data_params = MissingData(missing_pattern=pattern)\n", "solution, data = create_problem(cp_specific_params, missing_data_params)\n", "print(data)" ] }, { "cell_type": "markdown", "id": "31", "metadata": {}, "source": [ "## Creating Sparse Problems (CP only)\n", "If we assume each model parameter is the input to a Poisson process, then we can generate a sparse test problems. This requires that all the factor matrices and lambda be nonnegative. The default factor generator ('randn') won't work since it produces both positive and negative values." ] }, { "cell_type": "code", "execution_count": null, "id": "32", "metadata": {}, "outputs": [], "source": [ "# Generate factor matrices with a few large entries in each column\n", "# This will be the basis of our solution\n", "shape = (20, 15, 10)\n", "num_factors = 4\n", "A = []\n", "for n in range(len(shape)):\n", " A.append(np.random.rand(shape[n], num_factors))\n", " for r in range(num_factors):\n", " p = np.random.permutation(np.arange(shape[n]))\n", " idx = p[1 : round(0.2 * shape[n])]\n", " A[n][idx, r] *= 10\n", "S = ttb.ktensor(A)\n", "# S.normalize(sort=True);" ] }, { "cell_type": "code", "execution_count": null, "id": "33", "metadata": {}, "outputs": [], "source": [ "S.normalize(sort=True).weights" ] }, { "cell_type": "code", "execution_count": null, "id": "34", "metadata": {}, "outputs": [], "source": [ "# Create sparse test problem based on the solution.\n", "# `sparse_generation` controls how many insertions to make based on the solution.\n", "# The weight vector of the solution is automatically rescaled to match the number of insertions.\n", "existing_params = ExistingCPSolution(S, noise=0.0, sparse_generation=500)\n", "print(f\"{S.weights=}\")\n", "solution, data = create_problem(existing_params)\n", "print(\n", " f\"num_nozeros: {data.nnz}\\n\"\n", " f\"total_insertions: {np.sum(data.vals)}\\n\"\n", " f\"original weights vs rescaled: {S.weights / solution.weights}\"\n", ")" ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 5 }