# Creating Test Problems
```
Copyright 2025 National Technology & Engineering Solutions of Sandia,
LLC (NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the
U.S. Government retains certain rights in this software.
```

We demonstrate how to use the `create_problem` function to create test problems for decomposition algorithms. 

In [None]:
import pyttb as ttb
from pyttb.create_problem import (
 CPProblem,
 ExistingCPSolution,
 TuckerProblem,
 MissingData,
 create_problem,
)

In [None]:
# Set global random seed for reproducibility of this notebook
import numpy as np

np.random.seed(123)

## Create a CP test problem
The `create_problem` function generates both the solution (as a `ktensor` for CP) and the test data (as a dense `tensor`).

In [None]:
# Create a problem
cp_specific_params = CPProblem(shape=(5, 4, 3), num_factors=3, noise=0.1)
no_missing_data = MissingData()
solution, data = create_problem(cp_specific_params, no_missing_data)

In [None]:
# Display the solution
print(solution)

In [None]:
# Display the data
print(data)

In [None]:
# The difference between the true solution and measured data
# should match the specified noise setting
diff = (solution.full() - data).norm() / solution.full().norm()
print(diff)

## Creating a Tucker test problem
The `create_problem` function can also create Tucker problems by providing a `TuckerParams` data class as the first argument to `create_problem` instead. In this case, the function generates the solution as a `ttensor`.

In [None]:
tucker_specific_params = TuckerProblem(
 shape=(5, 4, 3), num_factors=[3, 3, 2], noise=0.1
)
no_missing_data = MissingData()
solution, data = create_problem(tucker_specific_params, no_missing_data)

In [None]:
# Display the solution
print(solution)

In [None]:
# Display the data
print(data)

In [None]:
# The difference between the true solution and measured data
# should match the specified noise setting
diff = (solution.full() - data).norm() / solution.full().norm()
print(diff)

## Recreating the same test problem
We are still relying on numpy's deprecated global random state. See [#441](https://github.com/sandialabs/pyttb/issues/441)

In [None]:
# Problem details
shape = [5, 4, 3]
num_factors = 3
seed = 123
missing_params = MissingData()
cp_specific_params = CPProblem(shape, num_factors=num_factors)

In [None]:
# Generate the first test problem
np.random.seed(seed)
solution_1, data_1 = create_problem(cp_specific_params, missing_params)

In [None]:
# Generate the second test problem
np.random.seed(seed)
solution_2, data_2 = create_problem(cp_specific_params, missing_params)

In [None]:
# Check that the solutions are identical
print(f"{solution_1.isequal(solution_2)=}")

# Check that the data are identical
print(f"{(data_1-data_2).norm()=}")

## Options for creating factor matrices, core tensors, and weights

User specified functions may be provided to generate the relevant components of `ktensors` or `ttensors`.

In [None]:
# Example custom weight generator for CP Problems
cp_specific_params = CPProblem(shape=[5, 4, 3], num_factors=2, weight_generator=np.ones)
solution, _ = create_problem(cp_specific_params, missing_params)
print(f"{solution.weights}")

In [None]:
# Example custom core generator for Tucker
tucker_specific_params = TuckerProblem(
 shape=[5, 4, 3], num_factors=[2, 2, 2], core_generator=ttb.tenones
)
solution, _ = create_problem(tucker_specific_params, missing_params)
print(f"{solution.core}")

## Create dense missing data problems
It's possible to create problems that have a percentage of missing data. The problem generator randomly creates the pattern of missing data.

In [None]:
# Specify 25% missing data
missing_data_params = MissingData(missing_ratio=0.25)

# Show an example of randomly generated pattern
# 1 is known 0 is unknown
print(missing_data_params.get_pattern(shape=[5, 4, 3]))

In [None]:
# Generate problem using a newly sampled pattern
solution, data = create_problem(cp_specific_params, missing_data_params)

In [None]:
# Show data (including noise) with missing entries zeroed out
print(data)

## Creating sparse missing data problems
If `sparse_models` is set to true then the returned data is sparse. This should only be used with `missing_ratio` >= 0.8.

In [None]:
missing_data_params = MissingData(missing_ratio=0.8, sparse_model=True)

# Here is a candidate pattern of known data
print(missing_data_params.get_pattern([5, 4, 3]))

In [None]:
# Here is the data (including noise) with zeros not explicitly represented.
solution, data = create_problem(cp_specific_params, missing_data_params)
print(data)

## Create missing data problems with pre-specified pattern
A specific pattern (dense or sparse) can be use to represent missing data. This is also currently the recommended approach for reproducibility.

In [None]:
# Grab a pattern from before
pattern = MissingData(missing_ratio=0.25).get_pattern([5, 4, 3])
missing_data_params = MissingData(missing_pattern=pattern)
solution, data = create_problem(cp_specific_params, missing_data_params)
print(data)

## Creating Sparse Problems (CP only)
If we assume each model parameter is the input to a Poisson process, then we can generate a sparse test problems. This requires that all the factor matrices and lambda be nonnegative. The default factor generator ('randn') won't work since it produces both positive and negative values.

In [None]:
# Generate factor matrices with a few large entries in each column
# This will be the basis of our solution
shape = (20, 15, 10)
num_factors = 4
A = []
for n in range(len(shape)):
 A.append(np.random.rand(shape[n], num_factors))
 for r in range(num_factors):
 p = np.random.permutation(np.arange(shape[n]))
 idx = p[1 : round(0.2 * shape[n])]
 A[n][idx, r] *= 10
S = ttb.ktensor(A)
# S.normalize(sort=True);

In [None]:
S.normalize(sort=True).weights

In [None]:
# Create sparse test problem based on the solution.
# `sparse_generation` controls how many insertions to make based on the solution.
# The weight vector of the solution is automatically rescaled to match the number of insertions.
existing_params = ExistingCPSolution(S, noise=0.0, sparse_generation=500)
print(f"{S.weights=}")
solution, data = create_problem(existing_params)
print(
 f"num_nozeros: {data.nnz}\n"
 f"total_insertions: {np.sum(data.vals)}\n"
 f"original weights vs rescaled: {S.weights / solution.weights}"
)