{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Creating Test Problems\n",
    "```\n",
    "Copyright 2025 National Technology & Engineering Solutions of Sandia,\n",
    "LLC (NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the\n",
    "U.S. Government retains certain rights in this software.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1",
   "metadata": {},
   "source": [
    "We demonstrate how to use the `create_problem` function to create test problems for decomposition algorithms. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pyttb as ttb\n",
    "from pyttb.create_problem import (\n",
    "    CPProblem,\n",
    "    ExistingCPSolution,\n",
    "    TuckerProblem,\n",
    "    MissingData,\n",
    "    create_problem,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set global random seed for reproducibility of this notebook\n",
    "import numpy as np\n",
    "\n",
    "np.random.seed(123)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4",
   "metadata": {},
   "source": [
    "## Create a CP test problem\n",
    "The `create_problem` function generates both the solution (as a `ktensor` for CP) and the test data (as a dense `tensor`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a problem\n",
    "cp_specific_params = CPProblem(shape=(5, 4, 3), num_factors=3, noise=0.1)\n",
    "no_missing_data = MissingData()\n",
    "solution, data = create_problem(cp_specific_params, no_missing_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display the solution\n",
    "print(solution)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display the data\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The difference between the true solution and measured data\n",
    "# should match the specified noise setting\n",
    "diff = (solution.full() - data).norm() / solution.full().norm()\n",
    "print(diff)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9",
   "metadata": {},
   "source": [
    "## Creating a Tucker test problem\n",
    "The `create_problem` function can also create Tucker problems by providing a `TuckerParams` data class as the first argument to `create_problem` instead. In this case, the function generates the solution as a `ttensor`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {},
   "outputs": [],
   "source": [
    "tucker_specific_params = TuckerProblem(\n",
    "    shape=(5, 4, 3), num_factors=[3, 3, 2], noise=0.1\n",
    ")\n",
    "no_missing_data = MissingData()\n",
    "solution, data = create_problem(tucker_specific_params, no_missing_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display the solution\n",
    "print(solution)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display the data\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "13",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The difference between the true solution and measured data\n",
    "# should match the specified noise setting\n",
    "diff = (solution.full() - data).norm() / solution.full().norm()\n",
    "print(diff)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14",
   "metadata": {},
   "source": [
    "## Recreating the same test problem\n",
    "We are still relying on numpy's deprecated global random state. See [#441](https://github.com/sandialabs/pyttb/issues/441)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Problem details\n",
    "shape = [5, 4, 3]\n",
    "num_factors = 3\n",
    "seed = 123\n",
    "missing_params = MissingData()\n",
    "cp_specific_params = CPProblem(shape, num_factors=num_factors)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate the first test problem\n",
    "np.random.seed(seed)\n",
    "solution_1, data_1 = create_problem(cp_specific_params, missing_params)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate the second test problem\n",
    "np.random.seed(seed)\n",
    "solution_2, data_2 = create_problem(cp_specific_params, missing_params)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check that the solutions are identical\n",
    "print(f\"{solution_1.isequal(solution_2)=}\")\n",
    "\n",
    "# Check that the data are identical\n",
    "print(f\"{(data_1-data_2).norm()=}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19",
   "metadata": {},
   "source": [
    "## Options for creating factor matrices, core tensors, and weights\n",
    "\n",
    "User specified functions may be provided to generate the relevant components of `ktensors` or `ttensors`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Example custom weight generator for CP Problems\n",
    "cp_specific_params = CPProblem(shape=[5, 4, 3], num_factors=2, weight_generator=np.ones)\n",
    "solution, _ = create_problem(cp_specific_params, missing_params)\n",
    "print(f\"{solution.weights}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Example custom core generator for Tucker\n",
    "tucker_specific_params = TuckerProblem(\n",
    "    shape=[5, 4, 3], num_factors=[2, 2, 2], core_generator=ttb.tenones\n",
    ")\n",
    "solution, _ = create_problem(tucker_specific_params, missing_params)\n",
    "print(f\"{solution.core}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22",
   "metadata": {},
   "source": [
    "## Create dense missing data problems\n",
    "It's possible to create problems that have a percentage of missing data. The problem generator randomly creates the pattern of missing data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Specify 25% missing data\n",
    "missing_data_params = MissingData(missing_ratio=0.25)\n",
    "\n",
    "# Show an example of randomly generated pattern\n",
    "# 1 is known 0 is unknown\n",
    "print(missing_data_params.get_pattern(shape=[5, 4, 3]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate problem using a newly sampled pattern\n",
    "solution, data = create_problem(cp_specific_params, missing_data_params)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "25",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Show data (including noise) with missing entries zeroed out\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26",
   "metadata": {},
   "source": [
    "## Creating sparse missing data problems\n",
    "If `sparse_models` is set to true then the returned data is sparse. This should only be used with `missing_ratio` >= 0.8."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "27",
   "metadata": {},
   "outputs": [],
   "source": [
    "missing_data_params = MissingData(missing_ratio=0.8, sparse_model=True)\n",
    "\n",
    "# Here is a candidate pattern of known data\n",
    "print(missing_data_params.get_pattern([5, 4, 3]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Here is the data (including noise) with zeros not explicitly represented.\n",
    "solution, data = create_problem(cp_specific_params, missing_data_params)\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29",
   "metadata": {},
   "source": [
    "## Create missing data problems with pre-specified pattern\n",
    "A specific pattern (dense or sparse) can be use to represent missing data. This is also currently the recommended approach for reproducibility."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Grab a pattern from before\n",
    "pattern = MissingData(missing_ratio=0.25).get_pattern([5, 4, 3])\n",
    "missing_data_params = MissingData(missing_pattern=pattern)\n",
    "solution, data = create_problem(cp_specific_params, missing_data_params)\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31",
   "metadata": {},
   "source": [
    "## Creating Sparse Problems (CP only)\n",
    "If we assume each model parameter is the input to a Poisson process, then we can generate a sparse test problems. This requires that all the factor matrices and lambda be nonnegative. The default factor generator ('randn') won't work since it produces both positive and negative values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "32",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate factor matrices with a few large entries in each column\n",
    "# This will be the basis of our solution\n",
    "shape = (20, 15, 10)\n",
    "num_factors = 4\n",
    "A = []\n",
    "for n in range(len(shape)):\n",
    "    A.append(np.random.rand(shape[n], num_factors))\n",
    "    for r in range(num_factors):\n",
    "        p = np.random.permutation(np.arange(shape[n]))\n",
    "        idx = p[1 : round(0.2 * shape[n])]\n",
    "        A[n][idx, r] *= 10\n",
    "S = ttb.ktensor(A)\n",
    "# S.normalize(sort=True);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33",
   "metadata": {},
   "outputs": [],
   "source": [
    "S.normalize(sort=True).weights"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "34",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create sparse test problem based on the solution.\n",
    "# `sparse_generation` controls how many insertions to make based on the solution.\n",
    "# The weight vector of the solution is automatically rescaled to match the number of insertions.\n",
    "existing_params = ExistingCPSolution(S, noise=0.0, sparse_generation=500)\n",
    "print(f\"{S.weights=}\")\n",
    "solution, data = create_problem(existing_params)\n",
    "print(\n",
    "    f\"num_nozeros: {data.nnz}\\n\"\n",
    "    f\"total_insertions: {np.sum(data.vals)}\\n\"\n",
    "    f\"original weights vs rescaled: {S.weights / solution.weights}\"\n",
    ")"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 5
}